Bug 75985

Summary: [NVC1] HDMI audio device only visible after rescan
Product: xorg Reporter: Jean-Louis Dupond <jean-louis>
Component: Driver/nouveauAssignee: Nouveau Project <nouveau>
Status: RESOLVED MOVED QA Contact: Xorg Project Team <xorg-team>
Severity: normal    
Priority: medium CC: andrey+freedesktop, dan, dennis.lissov, fedevx, hhfeuer, jan.public, jean-louis, lopin, lukas, peter, projekte.freedesktop, prymoo, rdoursenaud, sim.herter, vkrevs, zigarrre
Version: unspecified   
Hardware: x86-64 (AMD64)   
OS: Linux (All)   
Whiteboard:
i915 platform: i915 features:
Attachments:
Description Flags
acpidump
none
dmesg recording from a t520 trying the workaround
none
GP104: enable HDMI audio device function
none
Kernel module to toggle audio function
none
Kernel module to toggle audio function
none
PCI: Enable power to Nvidia HDA controllers on device enumeration
none
[PATCH 1/3] PCI: Expose Nvidia HDA controllers
none
[PATCH 2/3] PCI: Apply quirks on runtime resume despite being unbound
none
[PATCH 3/3] ALSA: hda - Broaden VGA class matching
none
Patch to simulate hidden HDA on systems which normally expose it
none
Acer Predator G3-572 acpidump
none
HDA runtime PM debug patch for v5.3
none
Dmesg dump to present the problem of NVIDIA HDA not suspending correctly.
none
Dmesg dump to present the problem of NVIDIA HDA not suspending correctly.
none
HDA runtime PM debug patch #2 for v5.3
none
Dmesg dump to present the problem of NVIDIA HDA not suspending correctly #2.
none
Debug patch to log invocations of pm_runtime_forbid()
none
Dmesg dump none

Description Jean-Louis Dupond 2014-03-10 15:56:17 UTC
Hi

I have a Dell XPS 15 laptop with Optimus.
It has the following NVIDIA card:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 540M] (rev a1)

The HDMI output is working fine since some time now.
Only there is still an issue with the HDMI audio.

On Windows, and with `lspci -H1` it shows the following devices:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 540M] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF108 High Definition Audio Controller (rev a1)

But by default, the audio device isn't visible after a clean boot. So the HDMI audio ain't working.

Now I've found some workaround to get it working.

- Start the system, and stop the display manager (lightdm/gdm)
- Load the nouveau module
- Remove the Nvidia card from the PCI bus (echo 1 > /sys/bus/pci/devices/0000\:01\:00.0/remove)
- Rescan the PCI bus (echo "1" > /sys/bus/pci/rescan)

And there it is, the Audio device is visible now:
01:00.0 VGA compatible controller: NVIDIA Corporation GF108M [GeForce GT 540M] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF108 High Definition Audio Controller (rev a1)

Also dmesg shows the following:
[  152.360864] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input18
[  152.361210] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input17
[  152.361415] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input16
[  152.361592] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input15

So the device is loaded correctly.
Now if we start lightdm/gdm/whatever again, and login (which also starts PulseAudio), the HDMI audio works correctly like it should :)

Some note:
If we do the rescan without nouveau module loaded, the workaround does not work. So we need to have nouveau loaded to make the Audio device visible.

On first boot I get the following error:
[  121.872253] nouveau 0000:01:00.0: enabling device (0006 -> 0007)
[  121.872554] [drm] hdmi device  not found 1 0 1

This might be related?
After the rescan, it shows the following:
[  151.557038] [drm] Initialized nouveau 1.1.1 20120801 for 0000:01:00.0 on minor 1
[  151.557186] snd_hda_intel 0000:01:00.1: enabling device (0000 -> 0002)
[  151.557253] hda_intel: Disabling MSI
[  151.557274] hda-intel 0000:01:00.1: Handle VGA-switcheroo audio client
[  151.557349] hda-intel 0000:01:00.1: Disabling 64bit DMA
[  151.560778] hda-intel 0000:01:00.1: Enable delay in RIRB handling
[  152.360864] input: HDA NVidia HDMI/DP,pcm=9 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input18
[  152.361210] input: HDA NVidia HDMI/DP,pcm=8 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input17
[  152.361415] input: HDA NVidia HDMI/DP,pcm=7 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input16
[  152.361592] input: HDA NVidia HDMI/DP,pcm=3 as /devices/pci0000:00/0000:00:01.0/0000:01:00.1/sound/card1/input15
[  156.801404] hda-intel 0000:01:00.1: Disabling via VGA-switcheroo

Hopefully this can help getting the issue fixed completely :)

Thanks!
Jean-Louis
Comment 1 Ilia Mirkin 2014-03-10 16:00:06 UTC
I assume that somewhere in that list of steps you had to plug the HDMI cable in? Is it the case, at least on windows, that the PCI device appears and disappears at the same time that the HDMI cable is plugged/unplugged?

What happens if you boot with the HDMI cable plugged in?
Comment 2 Jean-Louis Dupond 2014-03-10 16:15:40 UTC
Tested this out.

On Windows:
Both devices not visible in lspci. When you then start a game for example, the NVIDIA VGA device becomes visible. The audio device not.

Now if you plugin a HDMI cable, both devices (VGA & Audio) becomes visible.

If we now remove the HDMI cable, both devices stay there for a while, and then get deactivated together.

On Linux:
If we boot with HDMI cable in, same issue.

Now if I do the pci remove/rescan without HDMI cable, same result.
The Audio device becomes visible, even without having a HDMI cable in!
Comment 3 Jean-Louis Dupond 2014-03-10 23:14:00 UTC
Created attachment 95562 [details]
acpidump
Comment 4 Jean-Louis Dupond 2014-03-10 23:14:31 UTC
All hotplug options enabled in kernel:
CONFIG_MEMORY_HOTPLUG=y
CONFIG_MEMORY_HOTPLUG_SPARSE=y
CONFIG_HOTPLUG_CPU=y
# CONFIG_BOOTPARAM_HOTPLUG_CPU0 is not set
# CONFIG_DEBUG_HOTPLUG_CPU0 is not set
CONFIG_ARCH_ENABLE_MEMORY_HOTPLUG=y
CONFIG_ACPI_HOTPLUG_CPU=y
CONFIG_ACPI_HOTPLUG_MEMORY=y
CONFIG_HOTPLUG_PCI_PCIE=y
CONFIG_HOTPLUG_PCI=y
CONFIG_HOTPLUG_PCI_ACPI=y
CONFIG_HOTPLUG_PCI_ACPI_IBM=m
CONFIG_HOTPLUG_PCI_CPCI=y
CONFIG_HOTPLUG_PCI_CPCI_ZT5550=m
CONFIG_HOTPLUG_PCI_CPCI_GENERIC=m
CONFIG_HOTPLUG_PCI_SHPC=m
CONFIG_XEN_BALLOON_MEMORY_HOTPLUG=y
Comment 5 Jethro Beekman 2014-09-15 23:52:50 UTC
I've been having a similar issue for a while, see this thread: https://www.mail-archive.com/dri-devel@lists.freedesktop.org/msg45935.html

The workaround you propose (loading nouveau, removing the device, rescanning the bus) works.
Comment 6 Jethro Beekman 2014-09-16 06:33:28 UTC
Nevermind, the workaround lists the device, and ALSA/Pulse seem to recognize the device, but no audio gets through. I need to do more debugging.

I found this patch, it seems relevant: http://cgit.freedesktop.org/nouveau/linux-2.6/commit/?h=linux-3.18&id=cc2a9071458254cb0db6153811734750da0233ea

FYI the nouveau driver doesn't actually output any graphics on my machine (see https://bugs.launchpad.net/ubuntu/+source/xserver-xorg-video-nouveau/+bug/1292036 ), I'm just here to get audio working ;)
Comment 7 Peter Wu 2016-07-16 17:20:20 UTC
Possibly interesting information from
https://devtalk.nvidia.com/default/topic/609790/no-hdmi-sound-w-optimus-in-linux/

On Windows, the audio device is not visible normally, but when a HDMI audio cable is inserted, suddenly the audio device appears. Perhaps the Windows driver is doing some rescanning automatically?


Related: 20/111 Nvidia devices report an audio device. AFAIK newer machines have their HDMI port connected to the Nvidia card, so something is probably missing there. See https://lists.freedesktop.org/archives/nouveau/2016-July/025619.html
Comment 8 Peter Wu 2016-11-05 22:30:27 UTC
Following these steps I always see an audio function (01:00.1):

1. Power off laptop
2. Insert miniDP cable (HDMI probably works as well, did not test)
3. Power on laptop
4. Check lspci -s1:

When booting the laptop without cable there is some other weird behavior:
1. Power on laptop
2. Run "lspci -s1: ; lspci -H1 -s:"
   (first command also resumes the dGPU as side-effect)
   Expected: 2x 01:00:0
3. Wait 5 seconds for the card to runtime suspend (important, verify that this is actually the case, e.g. by watching "dmesg -w").
4. Insert cable
5. Run the above two lspci commands again. Now I also see the audio function (01:00.1).
6. Repeat step 3, remove audio cable, repeat step 5 (now the audio function is gone again!)

This was tested on a Clevo P651RA (GTX 965M). Weird!
Comment 9 zigarrre 2016-12-22 23:04:45 UTC
Created attachment 128640 [details]
dmesg recording from a t520 trying the workaround
Comment 10 zigarrre 2016-12-22 23:05:03 UTC
Any updates?

I can confirm this bug on my Lenovo Thinkpad T520 (4242-PT2). For testing I used a FHD TV connected via a DP to HDMI connector. The hardware setup was tested and confirmed to work as expected with Windows 7 so a hardware cause can be ruled out.

uname -a
Linux t520 4.8.13-1-ARCH #1 SMP PREEMPT Fri Dec 9 07:24:34 CET 2016 x86_64 GNU/Linux

I am using the optimus implementation provided by nouveau together with the intel graphics driver.

xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x8c cap: 0xb, Source Output, Sink Output, Sink Offload crtcs: 3 outputs: 3 associated providers: 0 name:Intel
Provider 1: id: 0x66 cap: 0x7, Source Output, Sink Output, Source Offload crtcs: 2 outputs: 5 associated providers: 0 name:nouveau

'lspci' doesn't show the audio device but 'lspci -H1' does:
01:00.0 VGA compatible controller: NVIDIA Corporation GF119M [Quadro NVS 4200M] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GF119 HDMI Audio Controller (rev a1)

Even 'lspci -H1' shows the audio device only when the discrete GPU is powered up (e.g. by doing 'lspci; lspci -H1').

I am not using a DM or a DE but bspwm launched via startx from the tty.

The procedure I used:
1) boot and login (x not yet started)
2) modprobe nouveau
3) echo 1 > /sys/bus/pci/devices/0000:01:00.0/remove
4) echo 1 > /sys/bus/pci/rescan
5) lspci now shows the audio device
6) startx
7) open pavucontrol (to trigger launch of pulseaudio via rtkit)

It made no difference when, or even if a device is connected to the DP Port. The output wasn't usable even with this workaround as pulseaudio showed it always as unplugged. On one single attempt it did show as connected and worked ok but even though following procedure exactly and trying many times I was not able to reproduce this.

After starting pulseaudio multiple errors of the form 'kernel: snd_hda_codec_hdmi hdaudioC2D0: out of range cmd 0:5:707:ffffffff' showed up.

The usage of this workaround also lead to a very unstable system till the next boot. I experienced cpu soft locks multiple times (though found no way to reliably reproduce them) from which no recovery except hard resetting the computer was possible.

Attached is a dmesg log where I follow the above procedure (connecting the display between steps 3 and 4) and activating/deactivating the display with xrandr once in the end.
I could also provide a syslog from boot till shutdown recorded with journald if required.

If I can help fixing this by providing more information or testing things let me know.
Comment 11 Maik Freudenberg 2017-09-26 10:04:36 UTC
Some interesting information from Aaron Plattner of Nvidia:
https://devtalk.nvidia.com/default/topic/1024022/linux/gtx-1060-no-audio-over-hdmi-only-hda-intel-detected-azalia/post/5211273/#5211273
Comment 12 Peter Wu 2017-09-26 22:50:28 UTC
(In reply to Maik Freudenberg from comment #11)
> Some interesting information from Aaron Plattner of Nvidia:
> https://devtalk.nvidia.com/default/topic/1024022/linux/gtx-1060-no-audio-
> over-hdmi-only-hda-intel-detected-azalia/post/5211273/#5211273

Interesting bit about the PCI configuration space. Essentially this is the command that Aaron posted. On my laptop that area is somewhere after the AER capability and is all zeroes, it does not seem to make sense.

Aarons command is essentially the above remove/rescan, but including:
setpci -s 01:00.0 0x488.l=0x2000000:0x2000000
Comment 13 Maik Freudenberg 2017-09-28 12:33:51 UTC
(In reply to Peter Wu from comment #12)
> On my laptop that area is somewhere after the AER
> capability and is all zeroes, it does not seem to make sense.

Why do you mention AER capability in this context? Are associated registers normally the highest used in PCI config space? Could you give a hint on that?
Seen from a hardware perspective, this could make sense. Since the audio is another PCI function not always needed, it sure has some enable pin on the chip so it can be toggled or disabled by some external circuitry, vendor specific. And some (reserved) gpio pin is used to make it switchable through software. So maybe the config space >=0x400 being gpio space?
Wild guess, of course.
Interesting would be if after write, this can be read or returns zero again? 
What about desktop cards, do they also have this register? Is this always set then?
Comment 14 Maik Freudenberg 2017-09-30 02:45:54 UTC
(In reply to Peter Wu from comment #12)
> AER capability
Red herring?
Comment 15 Peter Wu 2017-10-04 21:39:53 UTC
The AER capability was the closest one I could see just before the magic value of 0x488, it likely has nothing to do with 0x488. I haven't tried writing/reading it yet.

You might also be interested in this document:
https://github.com/envytools/envytools/blob/master/rnndb/bus/pci.xml
Comment 16 Ilia Mirkin 2017-10-12 15:04:47 UTC
So an interesting thought here is that GRUB also supports the setpci command, and runs early enough for it to matter. This would be the functional equivalent of an early pci quirk. If someone having the issue could try adding

setpci -s 01:00.0 0x488.l=0x2000000:0x2000000

to be run by grub, and reporting back whether that helped or not, with `lspci -nn -d 10de:` output, that'd be interesting to see. [Not everyone can use 01:00.0 -- double-check where your GPU is at. But that'll be right for most people.]
Comment 17 Maik Freudenberg 2017-10-14 19:36:25 UTC
(In reply to Ilia Mirkin from comment #16)
> So an interesting thought here is that GRUB also supports the setpci
> command
Really smart thinking but users trying it had no success. Register always returning value 0xFFFFFFFF. Which would normally(?) mean, no device there.
Which led me to dig a bit into setpci and it seems to me that without the kernel and its mmio, setpci is only able to work on the standard 256 pci registers but not the extended config space. So out of luck there?
Comment 18 Fede 2017-10-15 04:14:11 UTC
(In reply to Maik Freudenberg from comment #17)
> (In reply to Ilia Mirkin from comment #16)
> > So an interesting thought here is that GRUB also supports the setpci
> > command
> Really smart thinking but users trying it had no success. Register always
> returning value 0xFFFFFFFF. Which would normally(?) mean, no device there.
> Which led me to dig a bit into setpci and it seems to me that without the
> kernel and its mmio, setpci is only able to work on the standard 256 pci
> registers but not the extended config space. So out of luck there?


I can confirm this. It does not work from GRUB. The best work-around so far is to have a service do the setpci step before the nvidia drivers are loaded and display-manager.service starts (for those on systemd).
Comment 19 Denis Lisov 2017-11-10 23:32:17 UTC
Having the same bug.

Thinkpad P50, Quadro M1000M.

01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM107GLM [Quadro M1000M] [10de:13b1] (rev a2)

Ready to test patches.
Comment 20 Maik Freudenberg 2017-11-11 00:59:10 UTC
(In reply to Denis Lisov from comment #19)
> Ready to test patches.
Try the workaround in
https://devtalk.nvidia.com/default/topic/1024022/linux/gtx-1060-no-audio-over-hdmi-only-hda-intel-detected-azalia/post/5211273/#5211273
for now.
Some implications met. Workaround seems to have effects/depends on ACPI, backlight control not working, sleep/resume breaking audio, see same thread. Most interesting issue met in
https://devtalk.nvidia.com/default/topic/1025831/linux/gtx-1060m-with-linux-no-hdmi-audio-device-/
Workaround there works on linux 4.4 but not on 4.9, turning acpi off makes audio device appear on 4.9 without workaround.
Another green acpi hell needing kernel quirks?
Comment 21 Daniel Drake 2017-12-18 16:06:55 UTC
I checked the behaviour under Windows and it appears that Windows is doing the above-mentioned PCI config space write when a HDMI cable is plugged in. I'm fairly sure that this is done "natively" by the windows driver (not in ACPI stuff). More info:
https://devtalk.nvidia.com/default/topic/1024022/linux/gtx-1060-no-audio-over-hdmi-only-hda-intel-detected-azalia/post/5227504/#5227504

What would be the best way to implement this in nouveau? I'm unsure about the conditions we would want to do this under. Maybe something like:
 1. nouveau is driving the HDMI output using its own display controller, and
 2. HDMI cable is connected? (or maybe we enable HDMI audio all the time like other platforms?)
 3. Only on GTX1060/GTX1070 which are the only cards where the 0x448 register write has been confirmed to work?
Comment 22 Daniel Drake 2017-12-22 19:15:41 UTC
Created attachment 136369 [details] [review]
GP104: enable HDMI audio device function

On Asus GL502VS I investigated why the gfx device must be removed before rescan in the above workarounds.

The reason is that when Linux first probes the device (before you attempt the workaround), pci_setup_device() notes that the device is not multifunction capable. This causes pci_scan_slot() to not bother scanning the non-zero functions (per the behaviour inside next_fn()).

I checked and I found that when the 0x488 magic bit is not set, the gfx device advertises as non-multifunction. After the bit is set, the device advertises as multi-function. So, after setting the magic bit, removing the device will cause Linux to re-probe it during the next rescan, taking note at that point that it is a multi-function device, and proceeding to scan the functions, finding the audio device at function 1.

Based on that I have a first attempt at a fix. It's not working though, audio output is silent (but I did have it working with the previous workarounds). I'll look closer next week.
Comment 23 Maik Freudenberg 2017-12-23 02:00:55 UTC
@Daniel Drake: that's coherent with my investigations, even on a 740M without any outputs setting the bit at 0x488 changes the pci header type to 0x80, multifunction. My idea would be to use the pci_scan_single_device function to add the sub-function device. The really old fakephp driver available in kernels <2.29 would have been useful for general testing, but that's gone AWOL. The later one is useless, tested it. I also want to use the holyday break to do some development on this. What have you been thinking about?
Comment 24 Daniel Drake 2017-12-27 16:12:43 UTC
(In reply to Maik Freudenberg from comment #23)
> My idea would be to use the pci_scan_single_device function to add the sub-function device.

That's what I did in the above patch. Need to look closer at why my HDMI audio output is now silent though
Comment 25 Daniel Drake 2017-12-27 17:35:20 UTC
I found the difference. When I previously tested and found HDMI audio to be working after the setpci,unload,rescan approach, I was using the nvidia proprietary driver. HDMI audio PCI device was then detected, the ELD files show the monitor detected, and HDMI audio works.

In the patch above, I am using nouveau. The HDMI audio PCI device is now detected automatically, the HDMI video output is working and active, but the ELD files show that no HDMI monitor was detected on the audio side.

I checked with HDA verbs directly:
# hda-verb /dev/snd/hwC1D0 6 GET_PIN_SENSE 0

With the setpci+unload+rescan approach on nouveau, this always returns value 0, even with HDMI video output active and working.

With the setpci+unload+rescan approach on nvidia's proprietary drivers, this returns 0 after loading the modules. However upon starting X, it starts returning 0xc0000000 and HDMI audio output works fine.

So in addition to the games with the magic register, the nvidia proprietary driver is doing something else as well which is needed to make HDMI audio work.
Comment 26 Maik Freudenberg 2017-12-28 02:24:02 UTC
Created attachment 136416 [details]
Kernel module to toggle audio function

Thank you, Daniel. I was too blind to see the attachment.
Attaching some c/p work using/abusing bbswitch as a framework and your enable code. So works like that. Creates a handle /proc/acpi/nvhda which takes ON and OFF. Since my hardware doesn't have outputs I can't test it farther than turning on and off.
Can be loaded and used while the proprietary driver is loaded to be used for further investigations.
- Does the prop. driver react to this immediately, i.e. make audio work?
- If not, does plugging out/in the hdmi cable make it work?
- Are any pci config space changes noticeable?
Comment 27 Maik Freudenberg 2017-12-28 03:00:03 UTC
Created attachment 136418 [details]
Kernel module to toggle audio function

Bug removed.
Comment 28 Raphaël Doursenaud 2018-01-03 19:34:04 UTC
Maik Freudenberg, I tried your module on my ThinkPad P51 and it successfully enabled the HDMI audio.
HDMI audio playback is now fully functional on my laptop.
I'm using it with the proprietary driver.
Thanks!
Comment 29 Karol Herbst 2018-01-03 20:20:31 UTC
(In reply to Daniel Drake from comment #22)
> Created attachment 136369 [details] [review] [review]
> GP104: enable HDMI audio device function
> 
> On Asus GL502VS I investigated why the gfx device must be removed before
> rescan in the above workarounds.
> 
> The reason is that when Linux first probes the device (before you attempt
> the workaround), pci_setup_device() notes that the device is not
> multifunction capable. This causes pci_scan_slot() to not bother scanning
> the non-zero functions (per the behaviour inside next_fn()).
> 
> I checked and I found that when the 0x488 magic bit is not set, the gfx
> device advertises as non-multifunction. After the bit is set, the device
> advertises as multi-function. So, after setting the magic bit, removing the
> device will cause Linux to re-probe it during the next rescan, taking note
> at that point that it is a multi-function device, and proceeding to scan the
> functions, finding the audio device at function 1.
> 
> Based on that I have a first attempt at a fix. It's not working though,
> audio output is silent (but I did have it working with the previous
> workarounds). I'll look closer next week.

I think this check against 0x134 can be something like < 0x82 instead. This thing is there since like G82 GPUs and I am sure the HDMI audio device is controlled like this on all those GPUs.
Comment 30 Ilia Mirkin 2018-01-03 20:25:34 UTC
(In reply to Karol Herbst from comment #29)
> I think this check against 0x134 can be something like < 0x82 instead. This
> thing is there since like G82 GPUs and I am sure the HDMI audio device is
> controlled like this on all those GPUs.

Separate audio function started with GT215 (0xA3) and continues on GT216 (0xA5) and GT218 (0xA8). Note that it's not available on MCP77/79 (0xAA/0xAC) but I'm fairly sure *is* available on MCP89 (0xAF).
Comment 31 Maik Freudenberg 2018-01-03 21:27:00 UTC
As a sidenote: another user reported 5 Watts additional power draw when enabling the audio function. Regardless of this being accurate it should be taken into account to not enable it unconditionally since these are mobile devices.

@Raphaël Doursenaud
in which order did you do the following:
boot
plug in cable
start X
load module
turn on audio
?
Comment 32 Martin Lopatář 2018-01-04 07:51:32 UTC
Hello.
I have Dell XPS 17 L702X with discrete GPU GF106M [GeForce GT 555M] [10de:0dcd].
I'm trying to investigate the HDMI audio issue for some time, but I'm complete novice in all - kernel development, power management, PCI, ACPI.
On Windows, the audio device appears/disappears as I plug/unplug the HDMI cable and MF (multi-functional) bit in PCI config space is changing corresponding it.
Linux boots without audio device detected and MF bit not set, but my ntb has ACPI _DSM function for enabling/disabling discrete GPU including audio device.
To be precise, the enabling/disabling is not done using _DSM function, but using _DSM combined with _PS0 and _PS3.
So I was able to get HDMI audio working using the ACPI _DSM and remove+rescan via sysfs workaround. Since I modified kernel to ignore cached value of boot-time MF bit when doing rescan, removing the device via sysfs was not needed anymore (only rescan after enabling using ACPI _DSM).
Using _DSM I was able to disable/enable GPU+audio together, but not individually like Windows can. I was curious what is the magic the Windows do, and in the best case I want to be able to contribute to Linux by implementing this missing feature using the same approach as Windows use. I have just found this thread and saw that someone is working on it so I can provide at least some information about behavior in my environment so far.

Laptop model: Dell XPS 17 L702X
lspci output relevant lines:
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GF106M [GeForce GT 555M] [10de:0dcd] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation GF106 High Definition Audio Controller [10de:0be9] (rev a1)

"setpci -s 01:00.0 0x488.l=0x2000000:0x2000000" is working - sets the MF bit in 01:00.0 and 01:00.1 PCI devices config space.
I went through ACPI AML code and see that _DSM,_PS0,_PS3 functions on \_SB.PCI0.PEG0.PEGP do this as well (writing to 01:00.0 0x488).
I had an idea to do this in GRUB too (hoping that everything will work without need of any additional steps or kernel changing :)),
but like Maik Freudenberg [Comment 17], I found that in grub the extended config space is not accessible.
But because it is memory-mapped, I tried to do the same (as 'setpci'), but using 'write_dword' grub command and succeeded!

Calling "write_dword 0xf8100488 0x02000000 0x02000000" (in my case) in grub results in MF bit set so both GPU and audio functions are discovered at boot time without any kernel change needed, so enabling audio function in grub IS POSSIBLE!

Although everything looked fine at the first sight, no sound can be heard when I tested it. After another investigation I found that HDMI audio works - but only until the first runtime_pm suspend (in my case it happened just before X starts :( ). I use nouveau driver and I have not tried it with proprietary nvidia driver yet.
It looks like the problem is that the audio device is not correctly enabled by kernel after power state resumed to D0 - it has no irq and memory resources associated.
I have now intended to read something about PCI PM and Device Links just before I found this thread, but now I'm going to try the patch found here before that to know if it doesn't solve this as well.

Does someone else face same issues with nouveau driver and runtime_pm ?

Please let me know if I can help somehow.
Comment 33 spamas 2018-01-04 11:18:15 UTC
My system specs:
i7-7700HQ + GTX 1060 6GB
Linux kernel version: 4.10.0-42-generic
Nvidia driver Version: 384.90
OS: Linux Mint 18.2

I can confirm that kernel module, posted by Maik Freudenberg [Comment 27], is working fine on my system. Thank you for the fix. The HDMI audio device now works as it should.

The steps I did to enable HDMI audio device:
1. Download and extract the file nvhda.tar.xz.
2. Run commands in terminal:
   make
   sudo make install
   echo nvhda | sudo tee -a /etc/initramfs-tools/modules
   echo "options nvhda load_state=1" | sudo tee /etc/modprobe.d/nvhda.conf
   sudo update-initramfs -u
3. Reboot.

With this fix, I did not notice any problems with power management or system stability. HDMI audio works at system startup, after resume from sleep, after plugging/unplugging HDMI cable.
Comment 34 Maik Freudenberg 2018-01-04 23:37:20 UTC
(In reply to Martin Lopatář from comment #32)
Really nice find that it's depending on power states or the change thereof. Can you check when you boot to X with audio disabled and then enable it either using setpci/rescan with your mod. kernel or the module, does the audio work instantly?
Comment 35 Raphaël Doursenaud 2018-02-27 12:57:41 UTC
(In reply to Maik Freudenberg from comment #31)

I usually load the module at boot in the initrd.
I also loaded it manually with X already started.
Plugging in the cable before or after starting X seem to work regardless.
Also works with both nouveau and the proprietary driver.
Comment 36 Maik Freudenberg 2018-03-01 10:42:30 UTC
I don't know about the current state of this, I would put some effort into it if nobody else is working on it. Daniel Drake's patch is a good starting point for that.
Currently, it looks like the problem lies in that the audio dev is only ever looked after on nouveau drm module load. Instead, the state of it should be checked/toggled on
- connector connected/disconected
- poweron/off the corresponding gpu
e.g remove the audio dev on suspend if it has been previously enabled, on resume check if any connector is connected that needs it and enabled it again in that case.
This would also work around the quirks of some laptop's acpi toggling the bit either on boot [1] or on suspend [2].
Desktop cards should not be affected by that because they start with audio dev enabled, so it will not be touched.
Quirks will have to be added for nForce chipsets with IGP since those have a separate audio dev.

BTW, the kernel module has moved to:
https://github.com/hhfeuer/nvhda


[1] https://devtalk.nvidia.com/default/topic/1024022/linux/gtx-1060-no-audio-over-hdmi-only-hda-intel-detected-azalia/post/5228002/#5228002
[2] https://devtalk.nvidia.com/default/topic/1024022/linux/gtx-1060-no-audio-over-hdmi-only-hda-intel-detected-azalia/post/5234127/#5234127
Comment 37 Lukas Wunner 2018-03-03 10:41:53 UTC
Related to this issue, I've just posted v2 of my patch set to use a device link for power management of GPU-integrated HDA controllers:
https://lists.freedesktop.org/archives/dri-devel/2018-March/168012.html

It would be great if more people could test it. There's a 4.15-based branch available at:
https://github.com/l1k/linux/commits/switcheroo_devlink_v2

Crucially, this patch lets the HDA controller autosuspend at its own descretion, rather than forcing it on whenever the GPU is on. It looks like writing to bit 25 of config space dword 0x488 powergates the HDA controller. We could leverage that to runtime suspend the HDA controller to D3cold. I'll see to it that I cook up a patch.

As to the bit being cleared on boot, I think this should be done in a "header" PCI quirk rather than in nouveau. If you look at pci_scan_slot() and next_fn() in drivers/pci/probe.c, you'll notice that device functions are scanned from 0 upwards. So the GPU is always scanned first. Just add a PCI quirk which gets executed for the GPU, sets the bit and then reinitializes the multifunction flag in the GPU's struct pci_dev, that may already be sufficient. The PCI core should do all the rest. See quirk_jmicron_ata() for an example.
Comment 38 Lukas Wunner 2018-03-03 12:24:09 UTC
Created attachment 137764 [details] [review]
PCI: Enable power to Nvidia HDA controllers on device enumeration

This (untested) patch is basically what I had in mind with the PCI quirk. Runtime suspending to D3cold would be a separate patch then.
Comment 39 Maik Freudenberg 2018-03-03 13:35:43 UTC
Lukas, unconditionally enabling the nvidia hda shouldn't be done. In my case, having a "3D controlller" meaning a dGPU without outputs this would lead to having a broken HDA device visible.
Comment 40 Maik Freudenberg 2018-03-03 13:45:43 UTC
Should be easily avoidable by not enabling it when adapter class is PCI_CLASS_DISPLAY_3D
Comment 41 Lukas Wunner 2018-03-03 14:55:42 UTC
(In reply to Maik Freudenberg from comment #39)
> Lukas, unconditionally enabling the nvidia hda shouldn't be done. In my
> case, having a "3D controller" meaning a dGPU without outputs this would
> lead to having a broken HDA device visible.

Hm, broken in what way? My expectation would be that it just reports all connectors as disconnected. That might be irritating to some users, but other than that I don't see a real downside. If the HDA controller is runtime suspended to D3cold, power consumption is identical to the status quo ante.

But yes, just matching for PCI_CLASS_DISPLAY_VGA << 8 instead of PCI_BASE_CLASS_DISPLAY << 16 would also work, assuming that vendors always got this right and never used PCI_CLASS_DISPLAY_3D for a GPU with connectors.
Comment 42 Maik Freudenberg 2018-03-03 15:28:53 UTC
The hda devices on 3D controller are existing but not completely configured by vendors leading to errors like
[ 4790.121207] pci 0000:07:00.1: [10de:0e0f] type 00 class 0x040300
[ 4790.121253] pci 0000:07:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
[ 4790.121610] pci 0000:07:00.1: BAR 0: no space for [mem size 0x00004000]
[ 4790.121613] pci 0000:07:00.1: BAR 0: failed to assign [mem size 0x00004000]
[ 4790.121900] snd_hda_intel 0000:07:00.1: Disabling MSI
[ 4790.122493] snd_hda_intel 0000:07:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
[ 5661.858112] snd_hda_intel 0000:07:00.1: ioremap error

AFAIK, 3D class is used by nvidia to specifically mark the adapters without display engines, like Teslas or some Optimus dGPUs. So excluding them should be safe.
One thing that I don't know about but always had in the back of my mind are Tegras, how those are handled or react.
Looking at the vgaswitcheroo patches, these look like a really fine way to handle pm on these hda devices.
Comment 43 Lukas Wunner 2018-03-03 15:47:08 UTC
(In reply to Maik Freudenberg from comment #42)
> The hda devices on 3D controller are existing but not completely configured
> by vendors leading to errors like
> [ 4790.121207] pci 0000:07:00.1: [10de:0e0f] type 00 class 0x040300
> [ 4790.121253] pci 0000:07:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> [ 4790.121610] pci 0000:07:00.1: BAR 0: no space for [mem size 0x00004000]
> [ 4790.121613] pci 0000:07:00.1: BAR 0: failed to assign [mem size 0x00004000]
> [ 4790.121900] snd_hda_intel 0000:07:00.1: Disabling MSI
> [ 4790.122493] snd_hda_intel 0000:07:00.1: can't ioremap BAR 0: [??? 0x00000000 flags 0x0]
> [ 5661.858112] snd_hda_intel 0000:07:00.1: ioremap error

Is this a Thunderbolt-attached GPU? AFAICS these errors are not caused by missing bits in the device configuration but rather improper resource allocation by the kernel. This is common with Thunderbolt, but Mika Westerberg has upstreamed a bunch of fixes for 4.15 and more are currently being upstreamed:

https://lkml.org/lkml/2017/10/13/791
https://www.spinics.net/lists/linux-pci/msg69483.html

Have you tried this on a 4.15 kernel or was it an older version? If the latter, try on 4.15 again and see if the situation has improved.

> Looking at the vgaswitcheroo patches, these look like a really fine way to
> handle pm on these hda devices.

Thanks, glad you like them.
Comment 44 Ilia Mirkin 2018-03-03 17:54:23 UTC
(In reply to Maik Freudenberg from comment #42)
> The hda devices on 3D controller are existing but not completely configured
> by vendors leading to errors like
> [ 4790.121207] pci 0000:07:00.1: [10de:0e0f] type 00 class 0x040300
> [ 4790.121253] pci 0000:07:00.1: reg 0x10: [mem 0x00000000-0x00003fff]
> [ 4790.121610] pci 0000:07:00.1: BAR 0: no space for [mem size 0x00004000]
> [ 4790.121613] pci 0000:07:00.1: BAR 0: failed to assign [mem size
> 0x00004000]
> [ 4790.121900] snd_hda_intel 0000:07:00.1: Disabling MSI
> [ 4790.122493] snd_hda_intel 0000:07:00.1: can't ioremap BAR 0: [???
> 0x00000000 flags 0x0]
> [ 5661.858112] snd_hda_intel 0000:07:00.1: ioremap error
> 
> AFAIK, 3D class is used by nvidia to specifically mark the adapters without
> display engines, like Teslas or some Optimus dGPUs. So excluding them should
> be safe.

Sadly no. This was true for a time, but is unfortunately not accurate across all devices. One can check if the DCB table has any outputs, and only do stuff if there are none.This tends to be representative of whether there are actual outputs on a laptop connected to the GPU. (The DISPLAY block can also be fused off, which can also be checked.)
Comment 45 Denis Lisov 2018-03-04 00:08:52 UTC
(In reply to Lukas Wunner from comment #37)
> Related to this issue, I've just posted v2 of my patch set to use a device
> link for power management of GPU-integrated HDA controllers:
> https://lists.freedesktop.org/archives/dri-devel/2018-March/168012.html
> 
> It would be great if more people could test it. There's a 4.15-based branch
> available at:
> https://github.com/l1k/linux/commits/switcheroo_devlink_v2

Tested these patches on Lenovo Thinkpad P50. Audio works, the HDA and GPU suspend when unused with no errors in logs and resume when needed.
HDA unbind/bind worked with no errors, some "Too big adjustment {128,384}" messages in dmesg.

For this hardware, even a plugged in HDMI TV on boot does not cause the audio device to appear, so the test was done with a PCI early quirk like suggested in this bug.
Comment 46 Maik Freudenberg 2018-03-04 00:58:27 UTC
(In reply to Lukas Wunner from comment #43)
> Is this a Thunderbolt-attached GPU?
Calm down, get back to basics.
This is the dmesg when you turn on the hda dev on a 3d class device.
Just exclude the 3D class from toggling the bit. No big deal.
Comment 47 Maik Freudenberg 2018-03-04 01:07:02 UTC
There's of course the possibility that some braindead vendor would ship a 3D class tagged device actually having outputs. From my observations, this would break the prop. driver so this vendor would have to pay nvidia to add a quirk so this would work for two years.
Don't think too much of it, your approach is fine.
Comment 48 Maik Freudenberg 2018-03-04 01:23:58 UTC
(In reply to Denis Lisov from comment #45)
> For this hardware, even a plugged in HDMI TV on boot does not cause the
> audio device to appear, so the test was done with a PCI early quirk like
> suggested in this bug.
Yes, Lukas' main patches just(!) take care of power management of the hda device so it isn't powered on when it isn't needed which is pretty great.
The pci quirk is taking care of boot time, now it needs a second quirk for resume to have a perfect solution.
Comment 49 Ilia Mirkin 2018-03-04 02:38:12 UTC
(In reply to Maik Freudenberg from comment #47)
> There's of course the possibility that some braindead vendor would ship a 3D
> class tagged device actually having outputs.

This happens. A lot. See comment #44 for the proper procedure.
Comment 50 Lukas Wunner 2018-03-04 10:23:38 UTC
(In reply to Denis Lisov from comment #45)
> (In reply to Lukas Wunner from comment #37)
> > Related to this issue, I've just posted v2 of my patch set to use a device
> > link for power management of GPU-integrated HDA controllers:
> > https://lists.freedesktop.org/archives/dri-devel/2018-March/168012.html
> > 
> > It would be great if more people could test it. There's a 4.15-based branch
> > available at:
> > https://github.com/l1k/linux/commits/switcheroo_devlink_v2
> 
> Tested these patches on Lenovo Thinkpad P50. Audio works, the HDA and GPU
> suspend when unused with no errors in logs and resume when needed.
> HDA unbind/bind worked with no errors, some "Too big adjustment {128,384}"
> messages in dmesg.

Thank you, this helps greatly. I'll add your Tested-by to the commits so that you get credit for your testing efforts.
Comment 51 Lukas Wunner 2018-03-04 10:39:05 UTC
(In reply to Maik Freudenberg from comment #31)
> As a sidenote: another user reported 5 Watts additional power draw when
> enabling the audio function. Regardless of this being accurate it should be
> taken into account to not enable it unconditionally since these are mobile
> devices.

On my GK107 I do not see any change in power consumption regardless whether bit 25 at config space offset 0x488 is set or cleared, which is somewhat disappointing. I also do not see a change in power consumption between PCI power state D0 and D3hot.

So if it's true that enabling the HDA increases power consumption, that would seem to only apply to newer cards.

Could somebody verify this: Do you see a consistent drop or increase in power consumption when enabling/disabling the HDA?

setpci -s 01:00.0 0x488.l=0x0000000:0x2000000    # disable
setpci -s 01:00.0 0x488.l=0x2000000:0x2000000    # enable

Try this a couple of times and see if powertop shows a consistent difference in power consumption. (The laptop needs to run on battery for this to work, so disconnect the charger.)

If you *do* see a difference, double-check whether runtime suspending the HDA to D3hot (using my above-linked switcheroo_devlink_v2 patch set) also shows a difference. You can force the HDA into D0 or let it autosuspend to D3hot like this:

echo on   > /sys/bus/pci/devices/0000:01:00.1/power/control    # disable
echo auto > /sys/bus/pci/devices/0000:01:00.1/power/control    # enable

If the reduction in power consumption turns out to be the same in D3hot versus disabling via bit 25, there's no point in adding D3cold support to hda_intel.c.
Comment 52 Lukas Wunner 2018-03-04 12:47:50 UTC
(In reply to Ilia Mirkin from comment #49)
> (In reply to Maik Freudenberg from comment #47)
> > There's of course the possibility that some braindead vendor would ship a 3D
> > class tagged device actually having outputs.
> 
> This happens. A lot.

Ilia, do you have definitive knowledge of GPUs which
a) have a different class than PCI_CLASS_DISPLAY_VGA and
b) have working DP/HDMI outputs and
c) have an integrated HDA controller?

I'm asking because get_bound_vga() in sound/pci/hda/hda_intel.c specifically matches against PCI_CLASS_DISPLAY_VGA only. In other words, if a GPU with the three above-listed properties exists and is built into a hybrid graphics laptop, it is currently not registered with vga_switcheroo, which would be wrong.
Comment 53 Ilia Mirkin 2018-03-04 17:28:09 UTC
(In reply to Lukas Wunner from comment #52)
> (In reply to Ilia Mirkin from comment #49)
> > (In reply to Maik Freudenberg from comment #47)
> > > There's of course the possibility that some braindead vendor would ship a 3D
> > > class tagged device actually having outputs.
> > 
> > This happens. A lot.
> 
> Ilia, do you have definitive knowledge of GPUs which
> a) have a different class than PCI_CLASS_DISPLAY_VGA and
> b) have working DP/HDMI outputs and
> c) have an integrated HDA controller?
> 
> I'm asking because get_bound_vga() in sound/pci/hda/hda_intel.c specifically
> matches against PCI_CLASS_DISPLAY_VGA only. In other words, if a GPU with
> the three above-listed properties exists and is built into a hybrid graphics
> laptop, it is currently not registered with vga_switcheroo, which would be
> wrong.

I can say with some certainty that there are laptops running around, esp GM107's, whose pci class is 3D, and that have attached DP/HDMI outputs.

I don't think the users in question ever asked about audio, so I don't know about the last bit. However I can't imagine that it wouldn't be there (esp once all the proper enablement is done).

Is hda_intel only for intel? If so, I'm pretty sure that all intel vga devices are PCI_CLASS_DISPLAY_VGA. However if it's used for everything, then it needs to deal with DISPLAY_3D as well.
Comment 54 Maik Freudenberg 2018-03-06 19:26:13 UTC
(In reply to Ilia Mirkin from comment #44)
> One can check if the DCB table has any outputs, and only do
> stuff if there are none.
I don't see how that's feasible since this would require to load the ROM and parse it and this is about an early pci quirk or am I wrong there?
Sidenote: there's also one gtx560ti that's advertising as PCI_CLASS_MULTIMEDIA_OTHER
Comment 55 Ilia Mirkin 2018-03-06 19:30:45 UTC
(In reply to Maik Freudenberg from comment #54)
> (In reply to Ilia Mirkin from comment #44)
> > One can check if the DCB table has any outputs, and only do
> > stuff if there are none.
> I don't see how that's feasible since this would require to load the ROM and
> parse it and this is about an early pci quirk or am I wrong there?

Correct.

What's the downside for doing this always btw (except for a fixed list of pci ids/ranges, for the "older" chips, i.e. pre-fermi)?
Comment 56 Lukas Wunner 2018-03-06 19:57:28 UTC
(In reply to Ilia Mirkin from comment #53)
> (In reply to Lukas Wunner from comment #52)
> > Ilia, do you have definitive knowledge of GPUs which
> > a) have a different class than PCI_CLASS_DISPLAY_VGA and
> > b) have working DP/HDMI outputs and
> > c) have an integrated HDA controller?
> > 
> > I'm asking because get_bound_vga() in sound/pci/hda/hda_intel.c specifically
> > matches against PCI_CLASS_DISPLAY_VGA only. In other words, if a GPU with
> > the three above-listed properties exists and is built into a hybrid graphics
> > laptop, it is currently not registered with vga_switcheroo, which would be
> > wrong.
> 
> I can say with some certainty that there are laptops running around, esp
> GM107's, whose pci class is 3D, and that have attached DP/HDMI outputs.
> 
> I don't think the users in question ever asked about audio, so I don't know
> about the last bit. However I can't imagine that it wouldn't be there (esp
> once all the proper enablement is done).

In the meantime I've done extensive googling for dmesg output of a laptop that satisfies all three conditions listed above, but I come back empty-handed. I did find machines with a PCI_CLASS_DISPLAY_3D Nvidia card and an HDA device on function 1, but those weren't laptops. So it looks to me like we're good right now but we'll definitely need to amend hda_intel.c if we expose the HDA device on all modern cards.

> Is hda_intel only for intel?

No, that file contains the driver for all PCI HDA devices, its name is a historic artifact.
Comment 57 Maik Freudenberg 2018-03-06 20:02:41 UTC
(In reply to Ilia Mirkin from comment #55)
> What's the downside for doing this always btw 
By 'this', you mean, always turning it on?
This generates the errors from comment #42 since those devices are not configured resource-wise, all zeros.
I think that's rather ugly, users getting confused, trying to poke sound through the device, spamming mailing-lists. That's just the non-technical side of it.
Comment 58 Ilia Mirkin 2018-03-06 20:19:34 UTC
OK, well, I've seen this both ways -- 3D controllers with outputs as well as VGA display adapters with the display function actually fused off. The only reliable thing is the DCB block, but like you said, it's not appropriate for an early pci quirk.

Finding the specifics is going to be difficult. People come into #nouveau with a pastebin of their dmesg and some problem, and I note it in my head over the years. Finding these will be next to impossible, esp as the pastes expire fairly quickly.

Whatever solution you try to come up with, remember that all these cases are possible, but some more frequent than others.
Comment 59 Peter Wu 2018-03-07 12:04:38 UTC
(In reply to Lukas Wunner from comment #51)
> On my GK107 I do not see any change in power consumption regardless whether
> bit 25 at config space offset 0x488 is set or cleared, which is somewhat
> disappointing. I also do not see a change in power consumption between PCI
> power state D0 and D3hot.

Was this your macbook or a regular laptop? The Kepler family still used an ACPI _DSM method to remove power, setting the PM bit in the PCI config space did not do anything (this was at least true for my older Fermi card).


Possibly relevant for this bug: I tested Lukas devlink_v2 patches but found some issues when the HDMI/DP cable was disconnected on a GTX 965M (where by default the audio function would not be visible):
https://lists.freedesktop.org/archives/nouveau/2018-March/029988.html

Do you also experience issues after these steps:
1. remove+rescan to make audio function appear
2. remove GPU power (runtime PM or system sleep)
3. remove HDMI/DP cable
4. restore power
5. check dmesg or try to access audio function (e.g. read PCI regs)
Comment 60 Lukas Wunner 2018-03-09 15:39:52 UTC
Created attachment 137939 [details] [review]
[PATCH 1/3] PCI: Expose Nvidia HDA controllers
Comment 61 Lukas Wunner 2018-03-09 15:40:55 UTC
Created attachment 137940 [details] [review]
[PATCH 2/3] PCI: Apply quirks on runtime resume despite being unbound
Comment 62 Lukas Wunner 2018-03-09 15:41:33 UTC
Created attachment 137941 [details] [review]
[PATCH 3/3] ALSA: hda - Broaden VGA class matching
Comment 63 Lukas Wunner 2018-03-09 16:10:06 UTC
(In reply to Peter Wu from comment #59)
> (In reply to Lukas Wunner from comment #51)
> > On my GK107 I do not see any change in power consumption regardless whether
> > bit 25 at config space offset 0x488 is set or cleared, which is somewhat
> > disappointing. I also do not see a change in power consumption between PCI
> > power state D0 and D3hot.
> 
> Was this your macbook or a regular laptop? The Kepler family still used an
> ACPI _DSM method to remove power, setting the PM bit in the PCI config space
> did not do anything (this was at least true for my older Fermi card).

This was on my MacBook Pro. (Is that not a regular laptop? ;-) ) Power to the GK107 GPU is cut and reinstated by the GMUX controller rather than a _DSM. I can hide and expose the audio device by clearing or setting bit 25 at offset 0x488, but this does not change power consumption in any way. Only cutting power to the GPU via GMUX does.

That the audio device is powergated on newer GPUs was only a theory. Because, why would laptop vendors hide the audio device if nothing is connected? One possible explanation is that it saves power. Someone needs to confirm or debunk this theory by clearing and setting the bit and comparing power consumption in powertop.


> Possibly relevant for this bug: I tested Lukas devlink_v2 patches but found
> some issues when the HDMI/DP cable was disconnected on a GTX 965M (where by
> default the audio function would not be visible):
> https://lists.freedesktop.org/archives/nouveau/2018-March/029988.html
> 
> Do you also experience issues after these steps:
> 1. remove+rescan to make audio function appear
> 2. remove GPU power (runtime PM or system sleep)
> 3. remove HDMI/DP cable
> 4. restore power
> 5. check dmesg or try to access audio function (e.g. read PCI regs)

I've had a look at the acpidump of your machine:
https://github.com/Lekensteyn/acpi-stuff/tree/master/dsl/Clevo_P651RA

Bit 25 at offset 0x488 is named NHDA. In two places (in the _ON method of the root port's power resource and in _PS0 of the GPU device), the AML code probes three GPIO pins and exposes the HDA controller if any of them are high, or else hides the GPU. These GPIO pins are probably wired to HPD of the external connectors.

I think what happens in the scenario you've described above is: On runtime suspend or system suspend, the state of the HDA is saved (BARs etc). Because the cable is disconnected, the HDA will be hidden on resume. Now the kernel tries to wake the HDA device to D0 and will fail (you should see "Refused to change power state, currently in D3" in dmesg). It will then restore the saved state, which will also fail because the device is inaccessible. Also, the saved state is invalidated and cannot be restored again. So the HDA's BARs are blank until you remove+rescan.

I've just attached three patches, please try those on top of my switcheroo_devlink_v2 series, they should fix the issue.

Crucially, patch [1/3] now also applies the PCI quirk during the resume_noirq and runtime_resume phase. This means that in the scenario you've given above, even though the BIOS may have hidden the HDA, that change will be undone by the quirk. The quirk is executed for the GPU device. We know that the HDA device will resume *after* the GPU device because of the device link.

I cannot test these patches myself because the HDA is never hidden on my machine. So the patches are compile-tested only.
Comment 64 Lukas Wunner 2018-03-10 06:36:40 UTC
(In reply to Maik Freudenberg from comment #57)
> (In reply to Ilia Mirkin from comment #55)
> > What's the downside for doing this always btw 
> By 'this', you mean, always turning it on?
> This generates the errors from comment #42 since those devices are not
> configured resource-wise, all zeros.

I would assume the BAR is all zeroes because there was insufficient space to accommodate the additional 4K for it. AFAIUI, the kernel assigns only the minimal amount of space to the bridge window that is necessary to accomodate the devices below. Hence, if a device is added after the fact below the bridge, there may not be sufficient space available for it. I'd expect the situation to be different if the device is already present on enumeration, as is done by the "header" PCI quirk. Then the bridge window can be sized appropriately. Could you verify that on the machine in question?

It occurred to me that we could at least check presence of the Optimus _DSM in the PCI quirk. That would be sufficiently small for a quirk. That way it would be constrained to Optimus laptops, but sadly we'd still expose an audio function if the GPU has no outputs at all. However I notice that drivers/gpu/drm/nouveau/nouveau_acpi.c defines a OPTIMUS_HDA_CODEC_MASK macro ("hda bios codec supported"). Would that be of any use? Or can we query via the _DSM whether the GPU has any outputs?
Comment 65 Maik Freudenberg 2018-03-10 19:14:11 UTC
I feel like I'm jumping from puddle to puddle now.
Lukas, I previously couldn't test your patches due to an acpica bug affecting my machine rendering results useless. Luckily, this has been fixed now.

Meanwhile, I discovered that (at least) kernels 4.9-4.13 suffer from some pci config space bug making re-reading the header type impossible. The kernel will always put out a cached version thus making it impossible to enable the hda dev with any method, even remove/rescan.
So a fair warning to users running a 4.9LTS kernel trying to turn on audio. Kernel series 4.4 or 4.14+ work.

Now with the acpi bug fixed, I applied both of your v2 patchsets with some nip/tuck to a 4.16rc4+ kernel and ran into the next oddity: while the pci quirk in general works, re-reading the header type will always return 0x00 so the hda device is not added.
lspci confirms that the quirk is otherwise working:
00: de 10 92 12 07 00 10 00 a1 00 02 03 00 00 80 00
480: 00 00 00 00 17 00 00 00 00 00 00 02 00 00 00 00

I'll investigate what's happening the next days.
Comment 66 Lukas Wunner 2018-03-10 19:42:15 UTC
(In reply to Maik Freudenberg from comment #65)
> Now with the acpi bug fixed, I applied both of your v2 patchsets with some
> nip/tuck to a 4.16rc4+ kernel and ran into the next oddity: while the pci
> quirk in general works, re-reading the header type will always return 0x00
> so the hda device is not added.
> lspci confirms that the quirk is otherwise working:
> 00: de 10 92 12 07 00 10 00 a1 00 02 03 00 00 80 00
> 480: 00 00 00 00 17 00 00 00 00 00 00 02 00 00 00 00

Hm, the multifunction bit is set. Try adding an msleep(100) after setting the bit at offset 0x488 and before re-reading the header type register. 100 ms is likely way too much, this is just to ensure it's not too short.

Also, try changing DECLARE_PCI_FIXUP_CLASS_HEADER to DECLARE_PCI_FIXUP_CLASS_EARLY. Shouldn't make a difference but maybe I missed something.

Does it work if you just set gpu->multifunction = 1?
Comment 67 Maik Freudenberg 2018-03-10 20:55:59 UTC
(In reply to Lukas Wunner from comment #66)
> Hm, the multifunction bit is set. Try adding an msleep(100) after setting
> the bit at offset 0x488 and before re-reading the header type register. 100
> ms is likely way too much, this is just to ensure it's not too short.
No success. 
> Also, try changing DECLARE_PCI_FIXUP_CLASS_HEADER to
> DECLARE_PCI_FIXUP_CLASS_EARLY. Shouldn't make a difference but maybe I
> missed something.
No success.
> Does it work if you just set gpu->multifunction = 1?
Nope.
Completely absurd. Looks a bit like the mentioned bug in 4.9 though it vanishes after pci init.
Comment 68 Lukas Wunner 2018-03-11 12:53:50 UTC
Created attachment 137989 [details] [review]
Patch to simulate hidden HDA on systems which normally expose it

This is a quick & dirty hack to hide the HDA controller in the EFI stub. It can be used to simulate a hidden HDA controller on systems which normally expose it. Register 0x488 is written for any PCI device that has a ROM. On my machine that happens to only be the case for the Nvidia GPU.

If I boot my system with this patch applied, the HDA controller is not enumerated and invoking lspci on the GPU shows that the multifunction bit is cleared, as is bit 25 at offset 0x488.

Applying the three other patches I've attached to this bugzilla on top results in the HDA controller being enumerated again. So the patches seem to work as intended.

One interesting behavior I've noticed though: If I hide the HDA in the EFI stub and do not expose it with a PCI quirk, putting the GPU into D3cold and back into D0 causes the multifunction bit to be set, same for bit 25 at offset 0x488, even though the bits were cleared before the GPU went into D3cold. I don't see anything in the ACPI tables that bit 25 is written, I'm rather under the impression that the GPU has some non-volatile memory where the vendor preconfigures bits that are restored when the device comes out of power-on reset. But because the PCI quirk is also executed on resume of the GPU, we should have that covered.
Comment 69 Daniel Drake 2018-03-16 03:54:56 UTC
Thanks for the efforts here. I have tested the switcheroo_devlink_v2 branch plus patches:

 [PATCH 1/3] PCI: Expose Nvidia HDA controllers 
 [PATCH 2/3] PCI: Apply quirks on runtime resume despite being unbound
 [PATCH 3/3] ALSA: hda - Broaden VGA class matching 

on Asus GL502VS with GP104M [GeForce GTX 1070].

The quirk works in that the HDMI audio PCI device now appears. I checked with and without the quirk, the login screen idle power consumption from /sys/class/power_supply/BAT0/power_now is 31W-32W in both cases (no difference observed with the quirk).

However, with no graphics driver loaded, or with nouveau loaded, HDMI audio does not work at all and the eld files in /proc/asound show that no HDMI display is connected. If I load the proprietary nvidia driver and then start X, the eld files change their values to indicate a connected device, and HDMI audio works. After suspend and resume the HDMI audio continues working.

I also tested GL702VMK with GP106M [GeForce GTX 1060], with the proprietary nvidia driver, HDMI audio now works and also after suspend/resume.

Also tested Asus UX550GE with GP107M [GeForce GTX 1050 Ti Mobile] with the proprietary driver. This one could not be worked around with previous userspace approaches (setpci, remove, rescan) because after remove/rescan the magic bit had gone back to zero. But this new kernel approach works, the magic bit gets stuck on and HDMI audio works. Also tested suspend/resume and it still works.

Results look good, but I have 2 concerns:
 1. The proprietary nvidia driver does something special on these platforms to make the HDMI audio work after it has appeared. nouveau does not do this, so this will not fix HDMI audio for nouveau users on some platforms, something else is needed too.
 2. Do we understand why the underlying nvidia/windows design is like this in the first place? I previously confirmed that Windows dynamically enables/disables this magic bit based on presence of a HDMI display - i.e. Windows does not do the approach being considered here which makes it always-on. Is there a good reason for this that we're missing?
Comment 70 Aaron Plattner 2018-04-24 20:27:10 UTC
How the audio and video drivers interact is described a bit in [1]: The audio function and the graphics function work hand-in-hand to provide audio support. At modeset time, the graphics driver extracts information from the display's EDID and the mode timings and sticks it into registers in the graphics function. These get processed by the hardware and spit out the other side via the "EDID-Like Data" fields in the audio function, described in [2].

I'm not completely sure how this works on Windows, but my understanding is that the display driver enables or disables the audio function at modeset time, and Windows handles that as a PCI hot-plug or -unplug, loading or unloading the audio driver as necessary. On Linux, I think the nouveau and nvidia-modeset drivers would need to call into some hypothetical new audio power module to coordinate that, so the kernel can know when to rescan or remove the audio device.

[1] https://download.nvidia.com/XFree86/gpu-hdmi-audio-document/#_driver_architecture
[2] https://www.intel.com/content/www/us/en/standards/high-definition-audio-specification.html
Comment 71 Maik Freudenberg 2018-05-20 01:13:35 UTC
So you're talking about the driver toggling the bit and emitting a general hotplug event to trigger a rescan of the pci bus? Having one of those would sure be handy.
(You can imagine in windows a daemon triggering this every second to detect new hardware)
OTOH though, doing a rescan on its own subdevice isn't complicated, take as proof comment #26.
For any out-of-driver solution, fixing this: https://devtalk.nvidia.com/default/topic/1016046/?comment=5250957
would be helpful.
Comment 72 Mark Ackerman 2018-09-18 10:43:44 UTC
Mark Thank you SOOOOO Much.  

Now my seventh day with this new Hp Omen X 17-ap0xx and its' Nvidia GTX 1080m  (Hp's most awesome Laptop / GPU as of today WooHoo), and I always have despised Windows and stuck with Ubuntu Since Vista 13 years ago - and proud to have helped hundreds if not more convert to Ubuntu, but lately I am proud to say it does not take too much effort as Ubuntu is so EASY, for all drivers and ease of use, but,

BUT! With this Bleeding Edge Laptop I have been pulling out what hair is left on my head trying to simply get HDMI sound working. I was starting to think I was wrong 13 years later if Linux/Ubuntu can't even have digital sound easily on a 6 month old Video card ... And NO help from 20+ sources, and then I stumbled upon this fix.  Thank you So Much!   You have renewed my faith - So to speak

And I have spread the word to every blog I was on about this problem with your AWESOME Fix.

But Pray tell, 6 months later and linux/Ubuntu has not got this solved yet - what is their PROBLEM!   Or is this an evolving problem from Hp getting proprietary?  Can you respond to why this fix is not yet into the mainstream Ubuntu?

Always Trying to help, and in this thanks for the help, Me Mark TOO - Cheers!
Comment 73 fheday 2018-09-21 11:16:23 UTC
Hi all, I can confirm that the module posted by Maik Freudenberg [Comment 27] works nicely: I can turn the HDMI audio on/off on the fly with no problems at all! 
Thank you very much!
Comment 74 Eugene Medvedev 2019-03-31 09:57:59 UTC
(In reply to Maik Freudenberg from comment #27)
> Created attachment 136418 [details]
> Kernel module to toggle audio function

I have a somewhat strange variation on the bug when using your kernel module workaround.

MSI GE63 Raider 8RF looks like this when everything's working:

00:00.0 Host bridge: Intel Corporation 8th Gen Core Processor Host Bridge/DRAM Registers (rev 07)
00:01.0 PCI bridge: Intel Corporation Xeon E3-1200 v5/E3-1500 v5/6th Gen Core Processor PCIe Controller (x16) (rev 07)
00:02.0 VGA compatible controller: Intel Corporation Device 3e9b
00:12.0 Signal processing controller: Intel Corporation Cannon Lake PCH Thermal Controller (rev 10)
00:14.0 USB controller: Intel Corporation Cannon Lake PCH USB 3.1 xHCI Host Controller (rev 10)
00:14.2 RAM memory: Intel Corporation Cannon Lake PCH Shared SRAM (rev 10)
00:14.3 Network controller: Intel Corporation Wireless-AC 9560 [Jefferson Peak] (rev 10)
00:16.0 Communication controller: Intel Corporation Cannon Lake PCH HECI Controller (rev 10)
00:17.0 SATA controller: Intel Corporation Device a353 (rev 10)
00:1d.0 PCI bridge: Intel Corporation Cannon Lake PCH PCI Express Root Port 9 (rev f0)
00:1d.6 PCI bridge: Intel Corporation Device a336 (rev f0)
00:1f.0 ISA bridge: Intel Corporation Device a30d (rev 10)
00:1f.3 Audio device: Intel Corporation Cannon Lake PCH cAVS (rev 10)
00:1f.4 SMBus: Intel Corporation Cannon Lake PCH SMBus Controller (rev 10)
00:1f.5 Serial bus controller [0c80]: Intel Corporation Cannon Lake PCH SPI Controller (rev 10)
01:00.0 VGA compatible controller: NVIDIA Corporation GP104M [GeForce GTX 1070 Mobile] (rev a1)
01:00.1 Audio device: NVIDIA Corporation GP104 High Definition Audio Controller (rev a1)
02:00.0 Non-Volatile memory controller: Kingston Technology Company, Inc. Device 5008 (rev 01)
03:00.0 Ethernet controller: Qualcomm Atheros Killer E2500 Gigabit Ethernet Controller (rev 10)

What's strange is that upon cold boot -- using the proprietary driver -- the audio subdevice is missing entirely -- even from "lspci -H1" and can't be enabled:

[    3.214369] nvhda: version 0.01
[    3.214373] nvhda: Found nv VGA device 0000:01:00.0
[    3.214377] nvhda: enabling audio
[    3.214378] nvhda: Not multifunction, no audio
[    3.214380] nvhda: Succesfully loaded. Audio 0000:01:00.0 is off

Unloading and reloading the module after the boot has completed makes no difference, it doesn't work. Upon suspending and resuming once, however, the device is there on "lspci -H1", and the module can enable it. I ended up setting the module up to unload on suspend and load on resume, so that I can just do a suspend/resume when I need to use HDMI output, which isn't that often.

I can supply further data and experiment a bit if needed, but my understanding of this area of hardware is essentially nil.
Comment 75 jonasz 2019-06-02 22:29:07 UTC
Hi guys,

I have a dell t7610 centOS installation with two K6000 gpus with the latest available firmware installed. These gpus have two DisplayPorts and two DVI ports each.

There is no hdmi sound coming from any of the displayports, video is ok but no sound at all when using the latest NVIDIA available drivers. I did get sound when previously using nouveau.

I have spent the last couple of days trying and testing things that I have found over the web. FiR example https://download.nvidia.com/XFree86/gpu-hdmi-audio-document/ Besides quite a few crashes and reinstalls, still no 

Could this patch be used in my situation?

Thanks in advance,

Regards
Comment 76 Maik Freudenberg 2019-06-03 12:46:56 UTC
(In reply to jonasz from comment #75)
> Hi guys,
> 
> I have a dell t7610 centOS installation with two K6000 gpus with the latest
> available firmware installed. These gpus have two DisplayPorts and two DVI
> ports each.

This bug report is about the sound device of notebooks. So this does not apply to your desktop hardware. You're probably hitting this nvidia bug:
https://devtalk.nvidia.com/default/topic/1044547/linux/audio-problems-with-the-415-18-drivers/
The workaround would be to downgrade the driver to v410 or v390.
Comment 77 Daniel Drake 2019-06-20 07:25:33 UTC
Created attachment 144596 [details]
Acer Predator G3-572 acpidump

Martin Lopatář, thanks for the analysis above, sorry I missed those details before!

Checking on my Acer Predator G3-572 (acpidump attached here), I can see what you're referring to in SSDT5:

        OperationRegion (PCNV, SystemMemory, \_SB.PCI0.PEG0.PEGP.EBAS, 0x1000)
        Field (PCNV, AnyAcc, NoLock, Preserve)
        {
            Offset (0x488), 
                ,   25, 
            MLTF,   1
        }

        Method (_PS0, 0, NotSerialized)  // _PS0: Power State 0
        {
            If (DGOS)
            {
                If ((\_SB.PCI0.PEG0.PEGP.DPCS != Zero))
                {
                    \_SB.PCI0.PEG0.PEGP._ON ()
                    DGOS = Zero
                }
            }
            ElseIf ((\_SB.PCI0.PEG0.DVID != 0xFFFF))
            {
                If ((GGIV (0x01080001) == Zero))
                {
                    MLTF = Zero
                }
                Else
                {
                    MLTF = One
                    \_SB.PCI0.PEG0.PEGP.NASV = \_SB.PCI0.PEG0.PEGP.DSSV
                }
            }
        }

MLTF presumably means multifunction and it's exactly the bit we've been working with. But I haven't yet managed to get _PS0 to run this code. I get to the GGIV(0x01080001) call, but it returns 0, so the bit doesn't get set.

I looked for some obvious connection to stuff in _DSM but I can't spot anything. Can you clarify exactly what you saw that links _DSM and _PS0 together, and share your acpidump?

I tried understanding what GGIV() does but nothing is clear there. It ends up reading bit 1 from physical memory address 0xfdac0408 which is under:
pci_bus 0000:00: resource 21 [mem 0xfd000000-0xfe7fffff window]
and I can't immediately spot any ACPI code that writes to that address.
Comment 78 Lukas Wunner 2019-06-20 08:16:41 UTC
(In reply to Daniel Drake from comment #77)
> MLTF presumably means multifunction and it's exactly the bit we've been
> working with. But I haven't yet managed to get _PS0 to run this code. I get
> to the GGIV(0x01080001) call, but it returns 0, so the bit doesn't get set.
> 
> I tried understanding what GGIV() does but nothing is clear there. It ends
> up reading bit 1 from physical memory address 0xfdac0408 which is under:
> pci_bus 0000:00: resource 21 [mem 0xfd000000-0xfe7fffff window]
> and I can't immediately spot any ACPI code that writes to that address.

GGIV is a method name used by many vendors to read a GPIO pin. "Get GPIO Input Value" or something like that.

Quite likely the GPIO pin in question is attached to HPD of an HDMI or DP port. So if an external display is attached, GGIV(0x01080001) should return 1 and the HDA is exposed, else it's hidden.

If the GPIO pin in question is on the PCH then you can download the spec for the PCH from Intel's website to verify that the MMIO space at 0x01080001 is a GPIO block. The GPIO pin could also be on the Nvidia card itself, in that case physical address 0x01080001 would belong to a resource of the GPU's PCI device.
Comment 79 Daniel Drake 2019-06-24 09:56:03 UTC
Thanks, you're right, the value changes based on HDMI connector status.

So for this platform, \_SB.PCI0.PEG0 has a PG00 PowerResource that will set the magic bit in _ON, and likewise \_SB.PCI0.PEG0.PEGP has a _PS0 that will set the bit too.

This all sounds like it should set the appropriate state at boot time, but I wouldn't expect these methods to be called when the HDMI connector is hotplugged. And I can't see any linkage to anything more dynamic like _DSM.

Indeed booting Linux with HDMI already connected, the HDMI audio PCI device appears. Same on Windows, testing without the nvidia driver installed.

I used the Clover UEFI bootloader to boot windows with a custom DSDT, modified the GGIV() function to always return zero (disconnected) for this GPIO. Then booting Windows with HDMI connected, the PCI device no longer appears.

Then I installed the Nvidia windows driver. Connecting HDMI either at boot or later, the HDMI audio device appears on the PCI bus.

Conclusion: The nvidia windows driver directly controls the magic bit here, triggering a PCI bus rescan too, without relying on ACPI.
Comment 80 Daniel Drake 2019-07-11 07:24:10 UTC
A patch titled "PCI: Enable NVIDIA HDA controllers" (effecively attachment #137939 [details] [review]) is headed into linux-next and potentially Linux 5.3. Testing appreciated!
Comment 81 Przemysław Kopa 2019-09-28 08:28:25 UTC
(In reply to Daniel Drake from comment #80)
> A patch titled "PCI: Enable NVIDIA HDA controllers" (effecively attachment
> #137939 [details] [review]) is headed into linux-next and potentially Linux
> 5.3. Testing appreciated!

Hello there! This commit prevents vga_switcheroo from turning off dGPU (Geforce 540m) on my optimus machine as described here:
https://bbs.archlinux.org/viewtopic.php?pid=1865512#p1865512. I never had problems with HDMI audio on this system - I think all connectors are connected to the Intel GPU, yet NVIDIA HDA is still enabled by this patch.
Comment 82 Lukas Wunner 2019-09-28 10:08:49 UTC
(In reply to Przemysław Kopa from comment #81)
> (In reply to Daniel Drake from comment #80)
> > A patch titled "PCI: Enable NVIDIA HDA controllers" (effecively attachment
> > #137939 [details] [review]) is headed into linux-next and potentially Linux
> > 5.3. Testing appreciated!
> 
> Hello there! This commit prevents vga_switcheroo from turning off dGPU
> (Geforce 540m) on my optimus machine as described here:
> https://bbs.archlinux.org/viewtopic.php?pid=1865512#p1865512. I never had
> problems with HDMI audio on this system - I think all connectors are
> connected to the Intel GPU, yet NVIDIA HDA is still enabled by this patch.

If the HDA is kept runtime resumed, e.g. if it is accessed by a user space application, then the GPU is kept runtime resumed as well.

Check the runtime PM status of the HDA:
grep . /sys/bus/pci/devices/0000:01:00.1/power/*

Check if user space applications have opened the HDA:
lsof /dev/snd/controlC1
Comment 83 Przemysław Kopa 2019-09-28 10:58:51 UTC
> Check the runtime PM status of the HDA:
> grep . /sys/bus/pci/devices/0000:01:00.1/power/*

This is the output of grep . /sys/bus/pci/devices/0000:01:00.1/power/*:

/sys/bus/pci/devices/0000:01:00.1/power/async:enabled
/sys/bus/pci/devices/0000:01:00.1/power/control:on
/sys/bus/pci/devices/0000:01:00.1/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/power/runtime_active_time:122453
/sys/bus/pci/devices/0000:01:00.1/power/runtime_enabled:forbidden
/sys/bus/pci/devices/0000:01:00.1/power/runtime_status:active
/sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time:4310
/sys/bus/pci/devices/0000:01:00.1/power/runtime_usage:1
/sys/bus/pci/devices/0000:01:00.1/power/wakeup:disabled

> Check if user space applications have opened the HDA:
> lsof /dev/snd/controlC1

Output of lsof /dev/snd/controlC1 is empty. Normally I'm using pulseaudio and it grabs the card, but I uninstalled it for testing purposes and it doesn't make a difference - card stays enabled.

Additionally, if I try to manually turn off dGPU using echo OFF > /sys/kernel/debug/vgaswitcheroo/switch, it does nothing.
Comment 84 Lukas Wunner 2019-09-28 12:28:26 UTC
(In reply to Przemysław Kopa from comment #83)
> This is the output of grep . /sys/bus/pci/devices/0000:01:00.1/power/*:
> 
> /sys/bus/pci/devices/0000:01:00.1/power/control:on

Something has forced the HDA on, hence it doesn't autosuspend.

Does the issue go away if you re-enable runtime suspend? Try:

echo auto > /sys/bus/pci/devices/0000:01:00.1/power/control
Comment 85 Przemysław Kopa 2019-09-28 12:55:32 UTC
> Something has forced the HDA on, hence it doesn't autosuspend.
> 
> Does the issue go away if you re-enable runtime suspend? Try:
> 
> echo auto > /sys/bus/pci/devices/0000:01:00.1/power/control

Unfortunately, it doesn't - HDA still stays active.

cat /sys/bus/pci/devices/0000:01:00.1/power/control
auto

cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS-Audio: :DynPwr:0000:01:00.1
2:DIS: :DynPwr:0000:01:00.0
Comment 86 Lukas Wunner 2019-09-28 15:00:03 UTC
Please provide the output of "grep . /sys/bus/pci/devices/0000:01:00.1/power/*" after echoing "auto" to its "control" file.
Comment 87 Przemysław Kopa 2019-09-28 15:14:55 UTC
(In reply to Lukas Wunner from comment #86)
> Please provide the output of "grep .
> /sys/bus/pci/devices/0000:01:00.1/power/*" after echoing "auto" to its
> "control" file.

Here it is:
/sys/bus/pci/devices/0000:01:00.1/power/async:enabled
grep: /sys/bus/pci/devices/0000:01:00.1/power/autosuspend_delay_ms: Błąd wejścia/wyjścia // translates to: "Input/output error"
/sys/bus/pci/devices/0000:01:00.1/power/control:auto
/sys/bus/pci/devices/0000:01:00.1/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/power/runtime_active_time:105383
/sys/bus/pci/devices/0000:01:00.1/power/runtime_enabled:enabled
/sys/bus/pci/devices/0000:01:00.1/power/runtime_status:active
/sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time:6695
/sys/bus/pci/devices/0000:01:00.1/power/runtime_usage:0
/sys/bus/pci/devices/0000:01:00.1/power/wakeup:disabled
Comment 88 Daniel Drake 2019-10-02 08:13:05 UTC
So we don't have clear insight as to why the device is not being runtime suspended. I'll try to reproduce in our lab.
Comment 89 Maik Freudenberg 2019-10-02 08:35:21 UTC
I didn't follow all of this, but I suspect you're hitting a known bug, please see
http://download.nvidia.com/XFree86/Linux-x86_64/435.17/README/dynamicpowermanagement.html#KnownIssuesAndW6426e

> 2. There is a known issue with the audio driver due to which the audio PCI function remains in an active state from the kernel version 4.19 and up. (from commit id: 37a3a98ef601f89100e3bb657fb0e190b857028c). Upstream kernel changes are being done to fix the issue. In the interim, the Audio PCI function needs to be disabled by using the following command.
Comment 90 Przemysław Kopa 2019-10-02 09:44:31 UTC
(In reply to Maik Freudenberg from comment #89)

> > 2. There is a known issue with the audio driver due to which the audio PCI function remains in an active state from the kernel version 4.19 and up. (from commit id: 37a3a98ef601f89100e3bb657fb0e190b857028c). Upstream kernel changes are being done to fix the issue. In the interim, the Audio PCI function needs to be disabled by using the following command.

Wasn't this bug fixed already? https://github.com/torvalds/linux/commit/fc09ab7a767394f9ecdad84ea6e85d68b83c8e21
Comment 91 Daniel Drake 2019-10-02 09:52:59 UTC
(In reply to Przemysław Kopa from comment #90)
> Wasn't this bug fixed already?
> https://github.com/torvalds/linux/commit/
> fc09ab7a767394f9ecdad84ea6e85d68b83c8e21

That looks promising! Since you are the bug reporter, could you please test it?
Or have you already confirmed that it fixes the issue reported here?
Comment 92 Przemysław Kopa 2019-10-02 09:58:38 UTC
(In reply to Daniel Drake from comment #91)

> That looks promising! Since you are the bug reporter, could you please test
> it?
> Or have you already confirmed that it fixes the issue reported here?

Am I wrong, or was this patch already pulled in linux 4.20? In that case, it obviously didn't solve my problem. ;)
Comment 93 Daniel Drake 2019-10-02 10:14:41 UTC
Yes, that looks like history.

Lukas, do you have any further context on that link? Is it a historical remnant or something still relevant? Which upstream kernel changes are being done to fix it?
Comment 94 Daniel Drake 2019-10-02 10:15:02 UTC
Sorry getting mixed up. Those last questions were went for Maik.
Comment 95 Maik Freudenberg 2019-10-02 10:35:18 UTC
This is very fresh, about 1 month old, context is that the latest nvidia 435 driver added support for render offload and driver controlled dynamic runtime pm (turing only).
So I don't expect Nvidia to talk about bugs only existing in old kernel versions.
It seems to me that the mentioned commit from the nvidia docs breaks runtime-pm on the nvhda device but the subsequent commit refered to by Przemysław Kopa doesn't fix it.
As to what "ongoing works on the audio driver" means, this is out of my knowledge, would need to poke nvidia staff about it.
Comment 96 Maik Freudenberg 2019-10-02 12:16:50 UTC
Partly unrelated but important, just enabling runtime pm on a turing gpu without removing the subdevices as per the linked document leads to a _devastating_ effect, the idle power consumption rising to insane 51Watts(!!).
This was first noticed when Ubuntu switched to that method to turn off the dgpu for their gpu-manager/prime-select intel.
Comment 97 Lukas Wunner 2019-10-02 12:24:52 UTC
(In reply to Przemysław Kopa from comment #87)
> (In reply to Lukas Wunner from comment #86)
> > Please provide the output of "grep .
> > /sys/bus/pci/devices/0000:01:00.1/power/*" after echoing "auto" to its
> > "control" file.
> 
> Here it is:
> /sys/bus/pci/devices/0000:01:00.1/power/async:enabled
> grep: /sys/bus/pci/devices/0000:01:00.1/power/autosuspend_delay_ms: Błąd
> wejścia/wyjścia // translates to: "Input/output error"
> /sys/bus/pci/devices/0000:01:00.1/power/control:auto
> /sys/bus/pci/devices/0000:01:00.1/power/runtime_active_kids:0
> /sys/bus/pci/devices/0000:01:00.1/power/runtime_active_time:105383
> /sys/bus/pci/devices/0000:01:00.1/power/runtime_enabled:enabled
> /sys/bus/pci/devices/0000:01:00.1/power/runtime_status:active
> /sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time:6695
> /sys/bus/pci/devices/0000:01:00.1/power/runtime_usage:0
> /sys/bus/pci/devices/0000:01:00.1/power/wakeup:disabled

Okay, there are no child devices keeping the HDA awake and no runtime PM references are held on the HDA device either.  Why doesn't it runtime suspend?  Chances are that the ->runtime_idle hook returns -EBUSY for some reason.  We've had issues like this in the past, see bug #106597 and #106957.

I'm attaching a debug patch (the same that I've created for the other two bug reports, but rebased on v5.3).  Would you be able to apply it to your kernel, recompile, reboot, then attach the dmesg output to this bugzilla entry here?

You can add "log_buf_len=10M ignore_loglevel" to the command line to ensure that dmesg isn't truncated and contains all debug output.
Comment 98 Lukas Wunner 2019-10-02 12:32:51 UTC
Created attachment 145614 [details] [review]
HDA runtime PM debug patch for v5.3
Comment 99 Przemysław Kopa 2019-10-02 17:13:11 UTC
Here is the dmesg dump generated when running patched kernel.
Comment 100 Przemysław Kopa 2019-10-02 17:15:19 UTC
Created attachment 145615 [details]
Dmesg dump to present the problem of NVIDIA HDA not suspending correctly.
Comment 101 Przemysław Kopa 2019-10-02 17:45:38 UTC
Created attachment 145616 [details]
Dmesg dump to present the problem of NVIDIA HDA not suspending correctly.

Dmesg dump to present the problem of NVIDIA HDA not suspending correctly.

NVIDIA HDA power control was manually set to "auto".
Comment 102 Daniel Drake 2019-10-03 03:50:22 UTC
Thanks. azx_runtime_idle() is returning EBUSY because azx_bus(chip)->codec_powered=0xf.

codec_powered is set during initialization via snd_hdac_bus_add_device(), presumably to reflect that the device is definitely powered up at initialization time.

It's unset during hdac_hdmi_runtime_suspend() (and/or during hda_codec_runtime_suspend()) via the call to snd_hdac_codec_link_down().

I guess this implies that the HDA codec (hdac_hdmi.c) is expected to be fully runtime suspended before the controller (hda_intel.c) runtime idle check is executed, and that this is not happening.


Under /sys/bus/pci/devices/0000:01:00.1 you should see some subdirectories that are named hdaudioC?D?. Those subdirectories in turn have power subdirectories for runtime pm control.

In addition to the steps already taken, please could you set all the hdaudio* subdevices power/control to auto too, then use grep to dump the power/ directory contents for all of the hdaudio* devices there and the controller. And let us know if this has any effect on the issue at hand.

I did try 4 Asus products we currently have in the Endless lab but none of them have a nvidia HDMI controller that can be exposed via the magic bit (and their HDMI audio functionality does go through the integrated intel gpu).
Comment 103 Przemysław Kopa 2019-10-03 06:37:20 UTC
(In reply to Daniel Drake from comment #102)
> Under /sys/bus/pci/devices/0000:01:00.1 you should see some subdirectories
> that are named hdaudioC?D?. Those subdirectories in turn have power
> subdirectories for runtime pm control.
> 
> In addition to the steps already taken, please could you set all the
> hdaudio* subdevices power/control to auto too, then use grep to dump the
> power/ directory contents for all of the hdaudio* devices there and the
> controller. And let us know if this has any effect on the issue at hand.

There are four of those - all of them were already set to "auto" on boot (without me doing anything). HDA still stays active.

Here is the dump of "power" subdirectories:

/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/async:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/autosuspend_delay_ms:1000
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/control:auto
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/runtime_active_time:226
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/runtime_enabled:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/runtime_status:suspended
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/runtime_suspended_time:117748
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/power/runtime_usage:0

/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/async:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/autosuspend_delay_ms:1000
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/control:auto
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/runtime_active_time:242
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/runtime_enabled:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/runtime_status:suspended
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/runtime_suspended_time:117725
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D1/power/runtime_usage:0

/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/async:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/autosuspend_delay_ms:1000
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/control:auto
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/runtime_active_time:275
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/runtime_enabled:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/runtime_status:suspended
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/runtime_suspended_time:117684
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D2/power/runtime_usage:0

/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/async:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/autosuspend_delay_ms:1000
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/control:auto
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/runtime_active_time:264
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/runtime_enabled:enabled
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/runtime_status:suspended
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/runtime_suspended_time:117664
/sys/bus/pci/devices/0000:01:00.1/hdaudioC1D3/power/runtime_usage:0

/sys/bus/pci/devices/0000:01:00.1/power/async:enabled
grep: /sys/bus/pci/devices/0000:01:00.1/power/autosuspend_delay_ms: Błąd wejścia/wyjścia // translates to: "Input/output error"
/sys/bus/pci/devices/0000:01:00.1/power/control:auto
/sys/bus/pci/devices/0000:01:00.1/power/runtime_active_kids:0
/sys/bus/pci/devices/0000:01:00.1/power/runtime_active_time:127936
/sys/bus/pci/devices/0000:01:00.1/power/runtime_enabled:enabled
/sys/bus/pci/devices/0000:01:00.1/power/runtime_status:active
/sys/bus/pci/devices/0000:01:00.1/power/runtime_suspended_time:5389
/sys/bus/pci/devices/0000:01:00.1/power/runtime_usage:0
/sys/bus/pci/devices/0000:01:00.1/power/wakeup:disabled
Comment 104 Lukas Wunner 2019-10-03 08:22:59 UTC
(In reply to Daniel Drake from comment #102)
> Thanks. azx_runtime_idle() is returning EBUSY because
> azx_bus(chip)->codec_powered=0xf.
> 
> codec_powered is set during initialization via snd_hdac_bus_add_device(),
> presumably to reflect that the device is definitely powered up at
> initialization time.
> 
> It's unset during hdac_hdmi_runtime_suspend() (and/or during
> hda_codec_runtime_suspend()) via the call to snd_hdac_codec_link_down().
> 
> I guess this implies that the HDA codec (hdac_hdmi.c) is expected to be
> fully runtime suspended before the controller (hda_intel.c) runtime idle
> check is executed, and that this is not happening.

Right. However codec_powered is a bitmask and the position in the bitmask is the "addr" member of struct hdac_device. We can see from the dmesg output that there are four devices C1D0 .. C1D3. So only bits 0 .. 3 in codec_powered should ever be set. How can it be that bit 15 (0xf) is set?

I'll see to it that I prepare another debug patch today to instrument all the places where codec_powered is changed with printk's. But my suspicion is that the bit may be set differently.  E.g. codec_mask is immediately preceding codec_powered in the struct (assuming gcc doesn't change the order of the members). If we happen to set a bit > 64 in codec_mask, we may inadvertantly clobber codec_powered. So I'll try to instrument changes to surrounding members as well.
Comment 105 Daniel Drake 2019-10-03 09:17:36 UTC
codec_powered has value 0xf which means bits 0,1,2,3 are set. Bit 15 would be 0x8000.

But I agree with the next step of looking closer at accesses to this variable. Thanks for jumping on that!
Comment 106 Lukas Wunner 2019-10-03 16:11:30 UTC
Created attachment 145627 [details] [review]
HDA runtime PM debug patch #2 for v5.3
Comment 107 Lukas Wunner 2019-10-03 16:18:33 UTC
(In reply to Daniel Drake from comment #105)
> codec_powered has value 0xf which means bits 0,1,2,3 are set. Bit 15 would
> be 0x8000.

Ugh, indeed, thanks for having my back Daniel, I should stay away from bugzilla when half asleep. %-)

> But I agree with the next step of looking closer at accesses to this
> variable. Thanks for jumping on that!

I've just attached another debug patch, which is a variation of one I had done for bug #106597, rebased on v5.3. Przemysław, could you try this one, again setting "control" to "auto" on the HDA PCI device, and post the resulting dmesg? Thanks!
Comment 108 Daniel Drake 2019-10-04 00:21:25 UTC
If codec devices are always child devices of the controller, then I also wonder if codec_powered could be completely removed.

Seems like the PM core already ensures the children are inactive before suspending the controller:

> The idle callback (a subsystem-level one, if present, or the driver one) is
> executed by the PM core whenever the device appears to be idle, which is
> indicated to the PM core by two counters, the device's usage counter and the
> counter of 'active' children of the device.
> 
>   * If any of these counters is decreased using a helper function provided by
>     the PM core and it turns out to be equal to zero, the other counter is
>     checked.  If that counter also is equal to zero, the PM core executes the
>     idle callback with the device as its argument.
Comment 109 Przemysław Kopa 2019-10-04 11:45:48 UTC
Created attachment 145640 [details]
Dmesg dump to present the problem of NVIDIA HDA not suspending correctly #2.
Comment 110 Lukas Wunner 2019-10-05 10:22:50 UTC
(In reply to Przemysław Kopa from comment #109)
> Created attachment 145640 [details]
> Dmesg dump to present the problem of NVIDIA HDA not suspending correctly #2.

Thanks. This is the same issue as bug #106957, only that one was for AMD cards and yours is an Nvidia. It was fixed by commit 57cb54e53bdd ("ALSA: hda - Force to link down at runtime suspend on ATI/AMD HDMI").

If you add "codec->link_down_at_suspend = 1;" to patch_nvhdmi() in sound/pci/hda/patch_hdmi.c, the issue may go away.

The only question is whether your card's revision_id is listed in snd_hda_id_hdmi[] such that patch_nvhdmi() is executed for your card. What does "cat /sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/revision_id" say?
Comment 111 Przemysław Kopa 2019-10-07 17:51:17 UTC
(In reply to Lukas Wunner from comment #110)
> What does "cat /sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/revision_id" say?
It says: 0x100100

> If you add "codec->link_down_at_suspend = 1;" to patch_nvhdmi() in
> sound/pci/hda/patch_hdmi.c, the issue may go away.
> 
> The only question is whether your card's revision_id is listed in
> snd_hda_id_hdmi[] such that patch_nvhdmi() is executed for your card.

I added "HDA_CODEC_ENTRY(0x10de0403, "GPU 0403 HDMI/DP", patch_nvhdmi)" to snd_hda_id_hdmi[] (PCI ID of my Nvidia HDA wasn't there) and "codec->link_down_at_suspend = 1;" to patch_nvhdmi(). With those changes dGPU and HDA suspended normally (after echoing "auto" to HDA control file), so I think that this is definiteley the right track!
Comment 112 Lukas Wunner 2019-10-07 18:12:49 UTC
(In reply to Przemysław Kopa from comment #111)
> (In reply to Lukas Wunner from comment #110)
> > What does "cat /sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/revision_id" say?
> It says: 0x100100
> 
> > If you add "codec->link_down_at_suspend = 1;" to patch_nvhdmi() in
> > sound/pci/hda/patch_hdmi.c, the issue may go away.
> > 
> > The only question is whether your card's revision_id is listed in
> > snd_hda_id_hdmi[] such that patch_nvhdmi() is executed for your card.
> 
> I added "HDA_CODEC_ENTRY(0x10de0403, "GPU 0403 HDMI/DP", patch_nvhdmi)" to
> snd_hda_id_hdmi[] (PCI ID of my Nvidia HDA wasn't there) and
> "codec->link_down_at_suspend = 1;" to patch_nvhdmi(). With those changes
> dGPU and HDA suspended normally (after echoing "auto" to HDA control file),
> so I think that this is definiteley the right track!

Glad to hear. You don't seem to have any commits in the kernel so far. Would you like to try and bake these changes into a proper patch? If not I'll gladly create and submit the patch myself but mentoring someone else make their first contribution is more beneficial to the community, hence my question. You could attach the patch to this bugzilla and we can provide you with comments before you submit it to the list.
Comment 113 Przemysław Kopa 2019-10-16 16:24:31 UTC
(In reply to Lukas Wunner from comment #112)
>
> Glad to hear. You don't seem to have any commits in the kernel so far. Would
> you like to try and bake these changes into a proper patch? If not I'll
> gladly create and submit the patch myself but mentoring someone else make
> their first contribution is more beneficial to the community, hence my
> question.

Lukas, could you please handle it this time? Sorry for not posting sooner.
Comment 114 Lukas Wunner 2019-10-16 20:38:54 UTC
(In reply to Przemysław Kopa from comment #113)
> (In reply to Lukas Wunner from comment #112)
> > Glad to hear. You don't seem to have any commits in the kernel so far. Would
> > you like to try and bake these changes into a proper patch? If not I'll
> > gladly create and submit the patch myself but mentoring someone else make
> > their first contribution is more beneficial to the community, hence my
> > question.
> 
> Lukas, could you please handle it this time? Sorry for not posting sooner.

Sure thing.

Just one question, you wrote that you had to add "HDA_CODEC_ENTRY(0x10de0403, "GPU 0403 HDMI/DP", patch_nvhdmi)" to snd_hda_id_hdmi[] with the rationale that the "PCI ID of my Nvidia HDA wasn't there".

This confuses me because the PCI device ID of the HDA controller is "0bea" and "0403" are the 16 most significant bits of the PCI class ID.

HDA_CODEC_ENTRY() needs to match for the 32-bit HD audio vendor ID. Just to double-check, could you execute "cat /sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/vendor_id" and post the result here? Is it really 0x10de0403? Thanks!
Comment 115 Przemysław Kopa 2019-10-17 13:15:52 UTC
(In reply to Lukas Wunner from comment #114)
> HDA_CODEC_ENTRY() needs to match for the 32-bit HD audio vendor ID. Just to
> double-check, could you execute "cat
> /sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/vendor_id" and post the result
> here? Is it really 0x10de0403? Thanks!

cat /sys/bus/pci/devices/0000:01:00.1/hdaudioC1D0/vendor_id
0x10de0014

You're right, I didn't fully understand what to put there. ;)
Comment 116 Lukas Wunner 2019-10-20 07:41:06 UTC
@Przemysław Kopa:

The fix was applied by Takashi Iwai on Thursday Oct 17 with commit 94989e318b2f, it was merged to Linus' tree on Friday Oct 18 and will thus be part of v5.4-rc4 due out later today. It should appear in v5.3-stable within 1 or 2 weeks. You may want to double-check that the issue is gone with this fix.

There's one problem remaining, you shouldn't have to manually echo "auto" to the HDA's control file because we call pm_runtime_allow() on the HDA device in drivers/pci/quirks.c:quirk_gpu_hda() -> pci_create_device_link(). Something must be calling pm_runtime_forbid() afterwards, perhaps this is triggered from user space on Arch Linux. I'm attaching a little debug patch which logs a stacktrace to dmesg whenever pm_runtime_allow() / _forbid() is called for a device. Feel free to attach dmesg output with this patch applied and I'll be happy to take a look at it.
Comment 117 Lukas Wunner 2019-10-20 07:43:43 UTC
Created attachment 145778 [details] [review]
Debug patch to log invocations of pm_runtime_forbid()
Comment 118 Przemysław Kopa 2019-10-27 17:58:40 UTC
(In reply to Lukas Wunner from comment #116)
> @Przemysław Kopa:
> 
> The fix was applied by Takashi Iwai on Thursday Oct 17 with commit
> 94989e318b2f, it was merged to Linus' tree on Friday Oct 18 and will thus be
> part of v5.4-rc4 due out later today. It should appear in v5.3-stable within
> 1 or 2 weeks. You may want to double-check that the issue is gone with this
> fix.

Thanks, I can happily confirm that the issue is fixed in v5.4-rc4. :)

> There's one problem remaining, you shouldn't have to manually echo "auto" to
> the HDA's control file because we call pm_runtime_allow() on the HDA device
> in drivers/pci/quirks.c:quirk_gpu_hda() -> pci_create_device_link().
> Something must be calling pm_runtime_forbid() afterwards, perhaps this is
> triggered from user space on Arch Linux. I'm attaching a little debug patch
> which logs a stacktrace to dmesg whenever pm_runtime_allow() / _forbid() is
> called for a device. Feel free to attach dmesg output with this patch
> applied and I'll be happy to take a look at it.

I'm attaching dmesg dump - last stack trace is generated after me echoing "auto" to HDA control file.
Comment 119 Przemysław Kopa 2019-10-27 18:00:31 UTC
Created attachment 145830 [details]
Dmesg dump
Comment 120 Lukas Wunner 2019-10-27 18:33:24 UTC
(In reply to Przemysław Kopa from comment #118)
> (In reply to Lukas Wunner from comment #116)
> > There's one problem remaining, you shouldn't have to manually echo "auto" to
> > the HDA's control file because we call pm_runtime_allow() on the HDA device
> > in drivers/pci/quirks.c:quirk_gpu_hda() -> pci_create_device_link().
> > Something must be calling pm_runtime_forbid() afterwards, perhaps this is
> > triggered from user space on Arch Linux. I'm attaching a little debug patch
> > which logs a stacktrace to dmesg whenever pm_runtime_allow() / _forbid() is
> > called for a device. Feel free to attach dmesg output with this patch
> > applied and I'll be happy to take a look at it.
> 
> I'm attaching dmesg dump - last stack trace is generated after me echoing
> "auto" to HDA control file.

Okay the culprit is a tool called "tlp" which disables runtime PM on the HDA controller via sysfs:

[    8.472292] snd_hda_intel 0000:01:00.1: pm_runtime_forbid
[    8.474196] CPU: 0 PID: 494 Comm: tlp Not tainted 5.4.0-rc4-mainline #1
[    8.477943] Call Trace:
[    8.477952]  dump_stack+0x5c/0x80
[    8.477957]  pm_runtime_forbid.cold+0x1b/0x38
[    8.477960]  control_store+0x78/0x80
[    8.477964]  kernfs_fop_write+0x10e/0x190

I'm not familiar with "tlp" at all but according to this page...

https://linrunner.de/en/tlp/docs/tlp-configuration.html#audio

... I suspect you may have set "SOUND_POWER_SAVE_CONTROLLER=N" or "SOUND_POWER_SAVE_ON_AC=0" in /etc/default/tlp. If so, try if changing those settings fixes the issue.

Some more info on "tlp" can be found here:

https://wiki.archlinux.org/index.php/TLP

Hope that helps!
Comment 121 Przemysław Kopa 2019-10-28 06:57:03 UTC
(In reply to Lukas Wunner from comment #120)
> Okay the culprit is a tool called "tlp" which disables runtime PM on the HDA
> controller via sysfs:
Thanks, I fixed it by adding RUNTIME_PM_BLACKLIST="01:00.1" to the tlp config file - it prevents tlp from controlling runtime PM for that device.
Comment 122 Martin Peres 2019-12-04 08:44:32 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/97.

Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.