Summary: | Acer Aspire V7-582PG (Haswell, GTX 750M) fails to power off GPU with runtime PM | ||||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | xorg | Reporter: | Rick Kerkhof <rick> | ||||||||||||||||
Component: | Driver/nouveau | Assignee: | Nouveau Project <nouveau> | ||||||||||||||||
Status: | RESOLVED MOVED | QA Contact: | Xorg Project Team <xorg-team> | ||||||||||||||||
Severity: | normal | ||||||||||||||||||
Priority: | medium | CC: | peter | ||||||||||||||||
Version: | unspecified | ||||||||||||||||||
Hardware: | x86-64 (AMD64) | ||||||||||||||||||
OS: | Linux (All) | ||||||||||||||||||
Whiteboard: | |||||||||||||||||||
i915 platform: | i915 features: | ||||||||||||||||||
Attachments: |
|
Created attachment 127484 [details]
acpidump output
Created attachment 127485 [details]
lspci -tnnv output
Created attachment 127486 [details]
New dmesg output after manually turning on power management
So Lekensteyn asked me to run a few commands to also shut off the bus the GPU is on.
If I do so and then proceed by running powertop on battery, intel_pstate complains about turbo not being available, a usb device disconnects, a while later Nouveau resumes kernel object trees, Bluetooth reconnects and nouveau suspends again.
The commands are:
# echo auto > /sys/bus/pci/devices/0000\:00\:1c.4/power/control
# grep . /sys/bus/pci/devices/0000:0{0:1c.4,1:00.0}/power/{control,runtime_status}
The latter returns auto, suspended, auto, suspended before running powertop, and on, active, auto, suspended after running powertop.
Created attachment 127487 [details]
lspci -nnvvv output
Adding pcie_port_pm=off to my kernel command line causes the card to turn off and powertop to report ~7.5W of power usage. According to Lekensteyn this reverts nouveau to the 4.7 and lower behavior of using DSM, so I think this is a regression from using the new method. Booting without pcie_port_pm=off, while blacklisting nouveau on boot, then executing: echo 0 > /sys/bus/pci/devices/0000:01:00.0/d3cold_allowed && modprobe nouveau also causes powertop to report a ~7.5W value. Just to add extra info here, this problem also happens with bbswitch https://github.com/Bumblebee-Project/bbswitch/issues/140 Guys, do you know if this is really a bug from Linux or a feature? I meant, if changes to fix this problem would be at kernel side (PM team) or kernel interface side (vgaswitcheroo / bbswitch) ? Regards Pablo, the issues that bbswitch has is different from the one reported here. bbswitch is not updated for 4.8 requiring the pcie_port_pm=off workaround. There are more details for this bug from the reporter in IRC (search for NanoSector): https://people.freedesktop.org/~cbrill/dri-log/index.php?channel=nouveau&date=2016-10-22 In particular, Rick reported that the issue apparently also appears with older kernels, including 4.3 to 4.8. This is significant and a surprising result because kernel 4.8 plus pcie_port_pm=off (or the d3cold_allowed change) should have the same result as 4.7 or before. Rick, can you re-test it with 4.7? It also occurs to me that older kernels might not support your GPU, so be sure to keep a dmesg around. Sure, I'll have another test run with 4.7 this week. Hmm I just installed Linux 4.7.6 and ran it without any additional kernel parameters and I am getting results close to ˜7.5W too, so it seems to work there. So 4.7 and before used the "DSM" method on runtime-suspend: - \_SB.PCI0.RP05.PEGP._DSM would be invoked to enable Optimus - \_SB.PCI0.RP05.PEGP._PS3 is then invoked which would enter D3cold (note, this method is still used in 4.8 on older laptops or with the pcie_pm_port=off kernel option) Since 4.8, _DSM is not called anymore by nouveau (when support from the PCI core is detected) and this sequence should instead happen: - \_SB.PCI0.RP05.PEGP._PS3 (does nothing besides updating _STA) - PCIe core removes power for the PCIe port since all its children are in D3 and are willing to transition to D3cold. It does so by invoking \NVP3._OFF (where \NVP3 is the power resource from \_SB.PCI0.RP05._PR3) That is how I think it should work in theory, but on Ricks laptop running 4.8.4, /sys/bus/devices/0000:1c.4/firmware_node/ does not have power_resources_D0 devices (which I do have on my own laptop for 0000:01:0). The SSDT1 of Rick's Acer laptop shows this structure: If (\_OSI ("Windows 2013")) { Scope (\_SB.PCI0.RP05) { //... Name (_PR0, Package (0x01) // _PR0: Power Resources for D0 { NVP3 }) Name (_PR2, Package (0x01) // _PR2: Power Resources for D2 { NVP2 }) Name (_PR3, Package (0x01) // _PR3: Power Resources for D3hot { NVP3 }) // ... Method (_PS0, 0, NotSerialized) // _PS0: Power State 0 { } Method (_PS3, 0, NotSerialized) // _PS3: Power State 3 { } } Name (MSD3, Zero) PowerResource (NVP3, 0x00, 0x0000) { Name (_STA, One) // _STA: Status // ... Method (_ON, 0, NotSerialized) // _ON_: Power On { // ... } Method (_OFF, 0, NotSerialized) // _OFF: Power Off { // ... } } The dmesg does show "ACPI: Power Resource [NVP3] (on)", so I guess that the methods are found. It is a mystery to me why the "power_resources_Dx" files are not created, possibly breaking PM. At the moment it looks like an ACPI core bug which manifested in nouveau. See https://lists.freedesktop.org/archives/nouveau/2016-October/026395.html and the replies. I'll post a workaround patch soon. Created attachment 127942 [details] [review] Disable d3cold on bridge when falling back to _DSM The workaround patch has been merged in v4.9-rc3-34-gb0a6af8 (and backported to 4.8.7 via v4.8.6-109-g7290da4) but apparently it broke (system?) suspend/resume according to the reporter. Before the workaround patch: - _PR3 method is found, so nouveau assumes that PCI core takes care of D3cold. - Due to an ACPICA bug, PCI core fails to power off the device via runtime PM: https://bugs.acpica.org/show_bug.cgi?id=1333 After the workaround patch I guess that this happens: - _PR3 method is found, but unusable. Nouveau falls back to _DSM. - Due to the above ACPICA bug, the power resources not owned by any device. I guess that Linux then decides to power off the "unnecessary" power resource after system resume. (I saw something like this in a dmesg for a similiar SSDT) - At this point I would guess that nouveau then follows the old DSM method, but then I am confused because pcie_port_pm=off (or pre-4.8 kernels) supposedly have the same issue with this power resource. If pcie_port_pm=off helps, then the attached patch should also work (no pcie_port_pm=off needed). Can you give it a try on top of v4.8.7? Rick, were you actually able to suspend the system with kernel 4.7 and nouveau? Bug 98582 has a similar acpidump and claims that v4.7 also failed to suspend (actually, resume). Created attachment 127999 [details] attachment-21565-0.html Op zo 13 nov. 2016 om 00:07 schreef <bugzilla-daemon@freedesktop.org>: > *Comment # 14 <https://bugs.freedesktop.org/show_bug.cgi?id=98398#c14> on > bug 98398 <https://bugs.freedesktop.org/show_bug.cgi?id=98398> from Peter > Wu <peter@lekensteyn.nl> * > > Rick, were you actually able to suspend the system with kernel 4.7 and nouveau?Bug 98582 <https://bugs.freedesktop.org/show_bug.cgi?id=98582> has a similar acpidump and claims that v4.7 also failed to suspend > (actually, resume). > > ------------------------------ > You are receiving this mail because: > > - You reported the bug. > > Using pcie_port_pm=off on kernel 4.8 does not make resuming work; it still hangs on resuming with a black screen and no backlight (and sometimes a pointer with a black background). Rick reported that system suspend did not work before the patch either, so there is no regression in that sense. ACPICA developers are faster than expected, can you test these three patches: https://bugs.acpica.org/show_bug.cgi?id=1333#c45 -- GitLab Migration Automatic Message -- This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity. You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/xorg/driver/xf86-video-nouveau/issues/293. |
Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.
Created attachment 127483 [details] dmesg output Using Arch Linux and kernel 4.8.3, I am observing much higher power usage in powertop when using Nouveau with vgaswitcharoo (~13W) opposed to NVIDIA/Bumblebee with BBSwitch (~7.5 W). I initially started noticing this because the battery drained much faster and the fans started to spin while they otherwise stayed idle. Attached is my dmesg log.