Bug 101404 - GTX 970M (GM204-A) not powered off when not in use (DynPwr in stead of DynOff)
Summary: GTX 970M (GM204-A) not powered off when not in use (DynPwr in stead of DynOff)
Status: CLOSED NOTABUG
Alias: None
Product: xorg
Classification: Unclassified
Component: Driver/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Xorg Project Team
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2017-06-13 12:32 UTC by Benny Ammitzbøll
Modified: 2018-04-06 14:24 UTC (History)
1 user (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg log (69.57 KB, text/plain)
2017-06-13 12:32 UTC, Benny Ammitzbøll
no flags Details
Xorg log (33.02 KB, text/plain)
2017-06-13 12:33 UTC, Benny Ammitzbøll
no flags Details

Description Benny Ammitzbøll 2017-06-13 12:32:41 UTC
Created attachment 131921 [details]
dmesg log

Gentoo linux ver. 4.9.16 - using vgaswitcheroo, modesetting and nouveau.

My laptop is a Tuxedo XC1706 (aka Schenker XMG P706 aka Clevo P671RG) (i7-6700HQ/GTX 970M), external outputs (HDMI and 2 x DP) are wired to the NVIDIA card. The intel HD only has the laptop screen attached.

Issue 1 (minor):
----------------
Without pcie_port_pm=off and on battery I see DynOff, but ~26W with powertop (i.e. nvidia card is not turned off).

With pcie_port_pm=off and on battery I see DynOff and 16-18W with powertop (i.e. nvidia card is turned off).

Issue 2:
--------
With pcie_port_pm=off and on AC I always see DynPwr even though the nvidia card is unused (see below) - shouldn't this change to DynOff when unused? Is there a workaround way to force this to DynOff?

tux_xc1706 ~ # cat /sys/kernel/debug/vgaswitcheroo/switch
0:IGD:+:Pwr:0000:00:02.0
1:DIS: :DynPwr:0000:01:00.0

tux_xc1706 ~ # xrandr --listproviders
Providers: number : 2
Provider 0: id: 0x7a cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 3 outputs: 1 associated providers: 0 name:modesetting
Provider 1: id: 0x46 cap: 0xf, Source Output, Sink Output, Source Offload, Sink Offload crtcs: 4 outputs: 3 associated providers: 0 name:modesetting

tux_xc1706 ~ # xrandr 
Screen 0: minimum 320 x 200, current 1920 x 1080, maximum 8192 x 8192
eDP-1 connected primary 1920x1080+0+0 (normal left inverted right x axis y axis) 382mm x 215mm
   1920x1080     60.02*+
   1400x1050     59.98  
   1280x1024     60.02  
   1280x960      60.00  
   1024x768      60.04    60.00  
   960x720       60.00  
   928x696       60.05  
   896x672       60.01  
   800x600       60.00    60.32    56.25  
   700x525       59.98  
   640x512       60.02  
   640x480       60.00    59.94  
   512x384       60.00  
   400x300       60.32    56.34  
   320x240       60.05
Comment 1 Benny Ammitzbøll 2017-06-13 12:33:45 UTC
Created attachment 131922 [details]
Xorg log
Comment 2 Benny Ammitzbøll 2017-06-13 13:03:17 UTC
Workaround to force DynOff when on AC and nvidia card not in use: Use powertop to toggle all tunables "Bad" into "Good" (probably only need to toggle the nvidia PCI).
Comment 3 Peter Wu 2017-07-04 12:19:39 UTC
When the nouveau driver is loaded, runtime PM for the Nvidia PCI device (01:00.0 in your case) should be enabled. Without pcie_port_pm=off this will also enable runtime PM for the parent device, a PCIe port.

Any reason why you boot with acpi_osi=! acpi_osi=Linux? If you want to prevent a lockup that occurs with Clevo P6xxRx models, try acpi_osi="!Windows 2015"
(see https://bugzilla.kernel.org/show_bug.cgi?id=156341)
Comment 4 Benny Ammitzbøll 2017-07-05 11:14:10 UTC
(In reply to Peter Wu from comment #3)
> When the nouveau driver is loaded, runtime PM for the Nvidia PCI device
> (01:00.0 in your case) should be enabled.

Yes, I agree. Why is the Nvidia card not going to DynOff when on AC and not being used then? I can force it by using powertop, but why does this not happen automatically?

> Without pcie_port_pm=off this will
> also enable runtime PM for the parent device, a PCIe port.

Ok, but then I don't understand why without pcie_port_pm=off I get a higher power consumption? I mean, the parent device should power down the entire PCIe port incl. the Nvidia card? And before linux kernel 4.8 (I think) pcie_port_pm=off was not needed, so what happened?

> Any reason why you boot with acpi_osi=! acpi_osi=Linux? If you want to
> prevent a lockup that occurs with Clevo P6xxRx models, try
> acpi_osi="!Windows 2015"
> (see https://bugzilla.kernel.org/show_bug.cgi?id=156341)

Because while acpi_osi="!Windows 2015" does prevent the lockup, I still have some Fn keys that are not working (notably those that adjust screen brightness, but others as well). With acpi_osi=! acpi_osi=Linux all of my Fn keys are working, incl. no lockup.
Comment 5 Peter Wu 2017-07-06 10:42:54 UTC
(In reply to Benny Ammitzbøll from comment #4)
> (In reply to Peter Wu from comment #3)
> > When the nouveau driver is loaded, runtime PM for the Nvidia PCI device
> > (01:00.0 in your case) should be enabled.
> 
> Yes, I agree. Why is the Nvidia card not going to DynOff when on AC and not
> being used then? I can force it by using powertop, but why does this not
> happen automatically?

Please show the output of "lspci -nn". If you boot your laptop with an external display plugged in, you will additionally have a HDMI audio function (01:00.1) which must also have runtime PM enabled for the whole thing to power down.

In powertop, what are the states for the Nvidia PCI device(s) and its root parent port before you change anything?

> Ok, but then I don't understand why without pcie_port_pm=off I get a higher
> power consumption? I mean, the parent device should power down the entire
> PCIe port incl. the Nvidia card?

See above, if there are other children, then these must also have runtime PM enabled or the parent will refuse to enter suspend.

> And before linux kernel 4.8 (I think) pcie_port_pm=off was not needed,
> so what happened?

Currently there are two methods to turn off the power on modern Nvidia devices:
 - ACPI "DSM". The nouveau driver will call a special device-specific ACPI method  during runtime PM transitions.
 - ACPI Power Resources (since Linux 4.8). This means that when all devices (PCI root port and its children) are idle, the power resources can be turned off. This is a standard method that does not require a specific video driver, hence if you do not load nouveau but enable runtime PM, it will also work.

Why is the second method preferred? The second method is the standard one since Windows 8 and vendors might not test the former method. The former method is known to result in memory corruption issues on some models (bug 78530) and in other cases it will not fully reduce power consumption (resulting in increased heat, more fan noise and lower battery life).

> > Any reason why you boot with acpi_osi=! acpi_osi=Linux? If you want to
> > prevent a lockup that occurs with Clevo P6xxRx models, try
> > acpi_osi="!Windows 2015"
> > (see https://bugzilla.kernel.org/show_bug.cgi?id=156341)
> 
> Because while acpi_osi="!Windows 2015" does prevent the lockup, I still have
> some Fn keys that are not working (notably those that adjust screen
> brightness, but others as well). With acpi_osi=! acpi_osi=Linux all of my Fn
> keys are working, incl. no lockup.

Upgrade to kernel 4.10 or newer, then your brightness keys should work without forcing acpi_osi=Linux. See https://bugzilla.kernel.org/show_bug.cgi?id=123651
Comment 6 Benny Ammitzbøll 2017-07-06 12:25:35 UTC
(In reply to Peter Wu from comment #5)
> (In reply to Benny Ammitzbøll from comment #4)
> > (In reply to Peter Wu from comment #3)
> Please show the output of "lspci -nn". If you boot your laptop with an
> external display plugged in, you will additionally have a HDMI audio
> function (01:00.1) which must also have runtime PM enabled for the whole
> thing to power down.

I don't boot my laptop with any external display plugged in.

tux_xc1706 ~ # lspci -nn
00:00.0 Host bridge [0600]: Intel Corporation Skylake Host Bridge/DRAM Registers [8086:1910] (rev 07)
00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16) [8086:1901] (rev 07)
00:02.0 VGA compatible controller [0300]: Intel Corporation HD Graphics 530 [8086:191b] (rev 06)
00:14.0 USB controller [0c03]: Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller [8086:a12f] (rev 31)
00:16.0 Communication controller [0780]: Intel Corporation Sunrise Point-H CSME HECI #1 [8086:a13a] (rev 31)
00:17.0 SATA controller [0106]: Intel Corporation Sunrise Point-H SATA Controller [AHCI mode] [8086:a103] (rev 31)
00:1c.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #5 [8086:a114] (rev f1)
00:1c.5 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #6 [8086:a115] (rev f1)
00:1d.0 PCI bridge [0604]: Intel Corporation Sunrise Point-H PCI Express Root Port #9 [8086:a118] (rev f1)
00:1f.0 ISA bridge [0601]: Intel Corporation Sunrise Point-H LPC Controller [8086:a14e] (rev 31)
00:1f.2 Memory controller [0580]: Intel Corporation Sunrise Point-H PMC [8086:a121] (rev 31)
00:1f.3 Audio device [0403]: Intel Corporation Sunrise Point-H HD Audio [8086:a170] (rev 31)
00:1f.4 SMBus [0c05]: Intel Corporation Sunrise Point-H SMBus [8086:a123] (rev 31)
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce GTX 970M] [10de:13d8] (rev a1)
02:00.0 Unassigned class [ff00]: Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader [10ec:5287] (rev 01)
02:00.1 Ethernet controller [0200]: Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller [10ec:8168] (rev 12)
03:00.0 Network controller [0280]: Intel Corporation Wireless 8260 [8086:24f3] (rev 3a)
04:00.0 Non-Volatile memory controller [0108]: Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951 [144d:a802] (rev 01)

> In powertop, what are the states for the Nvidia PCI device(s) and its root
> parent port before you change anything?

In the "Tunables" tab? They're all "Bad" when on AC (until I change them to Good):

   Bad           Enable SATA link power management for host2
   Bad           Enable SATA link power management for host3
   Bad           Enable SATA link power management for host1
   Bad           Enable Audio codec power management
   Bad           NMI watchdog should be turned off
   Bad           VM writeback timeout
   Bad           Enable SATA link power management for host0
   Bad           Runtime PM for I2C Adapter i2c-21 (nvkm-0000:01:00.0-aux-0009)
   Bad           Runtime PM for I2C Adapter i2c-9 (nvkm-0000:01:00.0-aux-0003)
   Bad           Runtime PM for I2C Adapter i2c-23 (0000:01:00.0)
   Bad           Runtime PM for I2C Adapter i2c-2 (i915 gmbus dpb)
   Bad           Runtime PM for I2C Adapter i2c-6 (nvkm-0000:01:00.0-bus-0001)
   Bad           Runtime PM for I2C Adapter i2c-20 (nvkm-0000:01:00.0-bus-0009)
   Bad           Autosuspend for USB device BisonCam, NB Pro [Generic]
   Bad           Autosuspend for USB device EgisTec_ES603 [EgisTec]
   Bad           Autosuspend for USB device xHCI Host Controller [usb1]
   Bad           Autosuspend for USB device USB Receiver [Logitech]
   Bad           Runtime PM for I2C Adapter i2c-22 (0000:01:00.0)
   Bad           Autosuspend for unknown USB device 1-8 (8087:0a2b)
   Bad           Runtime PM for I2C Adapter i2c-7 (nvkm-0000:01:00.0-bus-0002)
   Bad           Autosuspend for USB device xHCI Host Controller [usb2]
   Bad           Runtime PM for I2C Adapter i2c-3 (i915 gmbus dpd)
   Bad           Runtime PM for I2C Adapter i2c-0 (SMBus I801 adapter at f040)
   Bad           Runtime PM for I2C Adapter i2c-1 (i915 gmbus dpc)
   Bad           Runtime PM for I2C Adapter i2c-19 (nvkm-0000:01:00.0-aux-0008)
   Bad           Runtime PM for I2C Adapter i2c-11 (nvkm-0000:01:00.0-aux-0004)
   Bad           Runtime PM for I2C Adapter i2c-12 (nvkm-0000:01:00.0-bus-0005)
   Bad           Runtime PM for I2C Adapter i2c-10 (nvkm-0000:01:00.0-bus-0004)
   Bad           Runtime PM for I2C Adapter i2c-14 (nvkm-0000:01:00.0-bus-0006)
   Bad           Runtime PM for I2C Adapter i2c-16 (nvkm-0000:01:00.0-bus-0007)
   Bad           Runtime PM for I2C Adapter i2c-18 (nvkm-0000:01:00.0-bus-0008)
   Bad           Runtime PM for I2C Adapter i2c-8 (nvkm-0000:01:00.0-bus-0003)
   Bad           Runtime PM for I2C Adapter i2c-15 (nvkm-0000:01:00.0-aux-0006)
   Bad           Runtime PM for I2C Adapter i2c-17 (nvkm-0000:01:00.0-aux-0007)
   Bad           Runtime PM for I2C Adapter i2c-5 (nvkm-0000:01:00.0-bus-0000)
   Bad           Runtime PM for I2C Adapter i2c-13 (nvkm-0000:01:00.0-aux-0005)
   Bad           Runtime PM for PCI Device Realtek Semiconductor Co., Ltd. RTL8111/8168/8411 PCI Express Gigabit Ethernet Controller
>> Bad           Runtime PM for PCI Device NVIDIA Corporation GM204M [GeForce GTX 970M]                                 
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H PCI Express Root Port #6
   Bad           Runtime PM for PCI Device Realtek Semiconductor Co., Ltd. RTL8411B PCI Express Card Reader
   Bad           Runtime PM for PCI Device Intel Corporation HD Graphics 530
   Bad           Runtime PM for PCI Device Intel Corporation Skylake PCIe Controller (x16)
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H LPC Controller
   Bad           Runtime PM for PCI Device Intel Corporation Wireless 8260
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H PMC
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H SATA Controller [AHCI mode]
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H CSME HECI #1
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H USB 3.0 xHCI Controller
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H PCI Express Root Port #5
   Bad           Runtime PM for PCI Device Samsung Electronics Co Ltd NVMe SSD Controller SM951/PM951
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H PCI Express Root Port #9
   Bad           Runtime PM for PCI Device Intel Corporation Skylake Host Bridge/DRAM Registers
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H HD Audio
   Bad           Runtime PM for PCI Device Intel Corporation Sunrise Point-H SMBus
   Good          Wireless Power Saving for interface wlan0
   Good          Bluetooth device interface status
   Good          Wake-on-lan status for device wlan0
   Good          Wake-on-lan status for device eth0

> Currently there are two methods to turn off the power on modern Nvidia
> devices:
>  - ACPI "DSM". The nouveau driver will call a special device-specific ACPI
> method  during runtime PM transitions.
>  - ACPI Power Resources (since Linux 4.8). This means that when all devices
> (PCI root port and its children) are idle, the power resources can be turned
> off. This is a standard method that does not require a specific video
> driver, hence if you do not load nouveau but enable runtime PM, it will also
> work.
> 
> Why is the second method preferred? The second method is the standard one
> since Windows 8 and vendors might not test the former method. The former
> method is known to result in memory corruption issues on some models (bug
> 78530) and in other cases it will not fully reduce power consumption
> (resulting in increased heat, more fan noise and lower battery life).

Ok, but you say that I can't use the second (preferred) option when I load the nouveau driver - so I must use pcie_port_pm=off. Shouldn't the nouveau driver be updated to support the second option in stead?

> Upgrade to kernel 4.10 or newer, then your brightness keys should work
> without forcing acpi_osi=Linux. See
> https://bugzilla.kernel.org/show_bug.cgi?id=123651

Not so, I am already on kernel 4.9.34:

tux_xc1706 ~ # uname -a
Linux tux_xc1706 4.9.34-gentoo #1 SMP Sat Jul 1 16:26:00 CEST 2017 x86_64 Intel(R) Core(TM) i7-6700HQ CPU @ 2.60GHz GenuineIntel GNU/Linux
Comment 7 Peter Wu 2017-07-06 15:49:02 UTC
(In reply to Benny Ammitzbøll from comment #6)
[..]
> 00:01.0 PCI bridge [0604]: Intel Corporation Skylake PCIe Controller (x16)
> [8086:1901] (rev 07)
[..]
> 01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GM204M [GeForce
> GTX 970M] [10de:13d8] (rev a1)
[..]
> In the "Tunables" tab? They're all "Bad" when on AC (until I change them to
> Good):
[..]
> >> Bad           Runtime PM for PCI Device NVIDIA Corporation GM204M [GeForce GTX 970M]                                 
[..]
>    Bad           Runtime PM for PCI Device Intel Corporation Skylake PCIe
> Controller (x16)

There is your problem. You have something that modifies runtime PM settings. In the default configuration, runtime PM would be enabled for both. Try removing/disabling that problematic tool.

> Ok, but you say that I can't use the second (preferred) option when I load
> the nouveau driver - so I must use pcie_port_pm=off. Shouldn't the nouveau
> driver be updated to support the second option in stead?

nouveau has been updated at the same time to support the new, preferred option. But if you use pcie_port_pm=off, it will fallback to the old method.

> > Upgrade to kernel 4.10 or newer, then your brightness keys should work
> > without forcing acpi_osi=Linux. See
> > https://bugzilla.kernel.org/show_bug.cgi?id=123651
> 
> Not so, I am already on kernel 4.9.34:

4.9 is too old, use 4.10 (TEN) or newer.
Comment 8 Benny Ammitzbøll 2017-07-09 14:41:05 UTC
(In reply to Peter Wu from comment #7)
> > >> Bad           Runtime PM for PCI Device NVIDIA Corporation GM204M [GeForce GTX 970M]                                 
> [..]
> >    Bad           Runtime PM for PCI Device Intel Corporation Skylake PCIe
> > Controller (x16)
> 
> There is your problem. You have something that modifies runtime PM settings.
> In the default configuration, runtime PM would be enabled for both. Try
> removing/disabling that problematic tool.

Hmm, ok, I do have laptop_mode enabled, so I tried stopping that, but still see "Bad" for those when on AC?

> > Not so, I am already on kernel 4.9.34:
> 
> 4.9 is too old, use 4.10 (TEN) or newer.

Arrg, sorry, seems I can't read :-)
Comment 9 Peter Wu 2017-07-09 14:51:51 UTC
(In reply to Benny Ammitzbøll from comment #8)

> Hmm, ok, I do have laptop_mode enabled, so I tried stopping that, but still
> see "Bad" for those when on AC?

Have you removed the pcie_port_pm=off option?
Is the nouveau driver loaded?
Comment 10 Benny Ammitzbøll 2017-07-09 16:37:44 UTC
(In reply to Peter Wu from comment #9)
> (In reply to Benny Ammitzbøll from comment #8)
> 
> > Hmm, ok, I do have laptop_mode enabled, so I tried stopping that, but still
> > see "Bad" for those when on AC?
> 
> Have you removed the pcie_port_pm=off option?
> Is the nouveau driver loaded?

Didn't know I had to do that. Without the pcie_port_pm=off option I do get "Good" initially (on AC). When on battery, power use is high (28W vs. 16-18W) though state is DynOff. When I plug in AC again, Nvidia card is "Bad" again (and DynPwr).

So for now I think I'll stick with pcie_port_pm=off.
Comment 11 Benny Ammitzbøll 2018-04-06 14:24:19 UTC
Have to use pcie_port_pm=off, but I can live with that.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.