Bug 107447 - regression in thermal management of Quadro FX 1500
Summary: regression in thermal management of Quadro FX 1500
Status: RESOLVED MOVED
Alias: None
Product: Mesa
Classification: Unclassified
Component: Drivers/DRI/nouveau (show other bugs)
Version: unspecified
Hardware: x86-64 (AMD64) Linux (All)
: medium normal
Assignee: Nouveau Project
QA Contact: Nouveau Project
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2018-08-01 15:51 UTC by Thomas Blume
Modified: 2019-09-18 20:47 UTC (History)
3 users (show)

See Also:
i915 platform:
i915 features:


Attachments
dmesg kernel 4.4 (67.72 KB, text/plain)
2018-08-01 15:51 UTC, Thomas Blume
Details
dmesg kernel 4.12 (70.14 KB, text/plain)
2018-08-01 15:51 UTC, Thomas Blume
Details
Test fix patch (1.35 KB, patch)
2018-09-03 15:23 UTC, Takashi Iwai
Details | Splinter Review

Description Thomas Blume 2018-08-01 15:51:27 UTC
Created attachment 140926 [details]
dmesg kernel 4.4

On my machine with an:

  Model: "nVidia Quadro FX 1500"
  Vendor: pci 0x10de "nVidia Corporation"
  Device: pci 0x029e "Quadro FX 1500"
  SubVendor: pci 0x10de "nVidia Corporation"
  SubDevice: pci 0x032c 
  Revision: 0xa1
  Driver: "nouveau"
  Driver Modules: "drm"

graphics card, the gpu fan start running at high load.
The issue started after installing openSUSE Leap15 with kernel version 4.12.
The same issue is visible using kernel version 4.17.
It was working fine on older openSUSE versions, up to kernel version 4.4.
Installing kernel version 4.4 on Leap 15 makes the issue go away.
The nouveau driver debug logs show when booting with kernel 4.12:

-->

 grep therm /mnt/dmesg-4_12.txt
[    5.985658] nouveau 0000:01:00.0: therm: FAN control: PWM
[    5.985667] nouveau 0000:01:00.0: therm: parsing the fan table failed
[    5.985680] nouveau 0000:01:00.0: therm: fan management: automatic
[    5.985687] nouveau 0000:01:00.0: therm: FAN target request: 90%
[    5.985692] nouveau 0000:01:00.0: therm: FAN target: 90
[    5.985697] nouveau 0000:01:00.0: therm: FAN update: 23
[    5.985707] nouveau 0000:01:00.0: therm: internal sensor: yes
[    6.005559] nouveau 0000:01:00.0: therm: programmed thresholds [ 90(3), 95(3), 130(2), 135(5) ]
[    6.485750] nouveau 0000:01:00.0: therm: FAN update: 26
[    6.985841] nouveau 0000:01:00.0: therm: FAN update: 29
[    7.485929] nouveau 0000:01:00.0: therm: FAN update: 32
[    7.986009] nouveau 0000:01:00.0: therm: FAN update: 35
[    8.486100] nouveau 0000:01:00.0: therm: FAN update: 38
[    8.986193] nouveau 0000:01:00.0: therm: FAN update: 41
[    9.486270] nouveau 0000:01:00.0: therm: FAN update: 44
[    9.987905] nouveau 0000:01:00.0: therm: FAN update: 47
[   10.489188] nouveau 0000:01:00.0: therm: FAN update: 50
[   10.990485] nouveau 0000:01:00.0: therm: FAN update: 53
[   11.491779] nouveau 0000:01:00.0: therm: FAN update: 56
[   11.993032] nouveau 0000:01:00.0: therm: FAN update: 59
[   12.494276] nouveau 0000:01:00.0: therm: FAN update: 62
[   12.995621] nouveau 0000:01:00.0: therm: FAN update: 65
[   13.497070] nouveau 0000:01:00.0: therm: FAN update: 68
[   13.998516] nouveau 0000:01:00.0: therm: FAN update: 71
[   14.500055] nouveau 0000:01:00.0: therm: FAN update: 74
[   15.001850] nouveau 0000:01:00.0: therm: FAN update: 77
[   15.505237] nouveau 0000:01:00.0: therm: FAN update: 80
[   16.006924] nouveau 0000:01:00.0: therm: FAN update: 83
[   16.508429] nouveau 0000:01:00.0: therm: FAN update: 86
[   17.009946] nouveau 0000:01:00.0: therm: FAN update: 89
[   17.511480] nouveau 0000:01:00.0: therm: FAN update: 90
--<

with kernel 4.4 it only shows:

-->
 grep therm /mnt/dmesg-4_4.txt 
[    7.547113] nouveau 0000:01:00.0: therm: FAN control: PWM
[    7.547117] nouveau 0000:01:00.0: therm: parsing the fan table failed
[    7.547121] nouveau 0000:01:00.0: therm: fan management: automatic
[    7.547124] nouveau 0000:01:00.0: therm: internal sensor: yes
[    7.566971] nouveau 0000:01:00.0: therm: programmed thresholds [ 90(3), 95(3), 130(2), 135(5) ]
--<

Attaching dmesg with noveau debug logs for kernel 4.4. and kernel 4.12.
Comment 1 Thomas Blume 2018-08-01 15:51:51 UTC
Created attachment 140927 [details]
dmesg kernel 4.12
Comment 2 Takashi Iwai 2018-08-03 10:18:11 UTC
We confirmed that the commit below brought the regression:
commit 800efb4c2857ec543fdc33585bbcb1fd5ef28337
    drm/nouveau/drm/therm/fan: add a fallback if no fan control is specified in the vbios

Reverting the commit fixes the problem on Thomas' machine.
Comment 3 caguduzexi 2018-08-12 23:23:21 UTC
Takashi Iwai, thanks a lot for finding a solution for that!
I am having this issue since over a year and nothing happens. I am willing to switch over to suse when this would finally get fixed and the GPU card wont spin at 100% the whole time.
Here one of mine bugreports:

https://bugs.freedesktop.org/show_bug.cgi?id=102352
Comment 4 Takashi Iwai 2018-08-13 07:19:47 UTC
I *guess* that a more better fix would be to check the error from the fan-table parsing and avoid the linear fallback when it failed, instead of the whole revert.
Comment 5 Takashi Iwai 2018-09-03 15:22:32 UTC
One problem seems to be that the thermal target is set only once by the linear fallback, and forgotten.  The patch below "fixes" that, at least, to keep updating the fan with the linear fallback.

But, according to Thomas, this still seems too high.  I'm not sure whether it's the correct behavior or it was just because the fan never worked in the past.
Comment 6 Takashi Iwai 2018-09-03 15:23:35 UTC
Created attachment 141427 [details] [review]
Test fix patch
Comment 7 GitLab Migration User 2019-09-18 20:47:00 UTC
-- GitLab Migration Automatic Message --

This bug has been migrated to freedesktop.org's GitLab instance and has been closed from further activity.

You can subscribe and participate further through the new bug through this link to our GitLab instance: https://gitlab.freedesktop.org/mesa/mesa/issues/1161.


Use of freedesktop.org services, including Bugzilla, is subject to our Code of Conduct. How we collect and use information is described in our Privacy Policy.